Calligra/Architecture/AuthorRDF
Materials to read about RDF in general
- https://en.wikipedia.org/wiki/Resource_Description_Framework
- http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/
- http://www.w3.org/TR/rdf-syntax-grammar/
- http://en.wikipedia.org/wiki/RDFa
- http://rdfa.info/
- http://www.w3.org/TR/rdfa-lite/
- Documentation and code for classes in /libs/rdf and /plugins/semanticitems
Concept
Why we are using RDF to store Author's data? During design phase of Outliner, there was a goal to save it's data along with usual OpenDocument-formatted file, so we aren't planning to introduce some new file format, moreover task was to make Author's document be editable with usual text-writers as OpenOffice Writer, etc. We can add additional "Author-data-only" files to a package, but such behavior in bound of OpenDocument Standard can only be treated as attaching file to a document, that distort meaning of "attached file". Another disadvantage is that, when we modify file outside Author, this file will become outdated. At that point RDF seems to be an ideal function to implement storing of various data along with a document, and furthermore RDF allows to place in-text linkage to this data. And this data can be contained in a separate file of package.
OpenDocument allows us to mark specific elements in document contents as a related to some RDF subject. So, for example, if we want to mark that actor Vasya is participating some scene, we will make such sequence of actions to store this info with RDF:
-
Create Vasya subject of type Actor:
RDF: (subject - predicate - object) {ActorId} - type - Actor {ActorId} - name - Vasya
-
Create scene subject:
RDF: {SceneID} - type - Scene
-
Store "Vasya is participating this scene" info:
RDF: {SceneID} - hasActor - {ActorId}
-
Assign a <text:section> element corresponding to a scene unique xml:id.
contents.xml: <text:section xml:id="{xmlId}"> {scene contents here} </text:section>
-
-
Ensure that manifest.rdf contains such set of triples:
RDF: {PackageId} - type - Package {PackageId} - hasPart - {ContentFileId} {ContentFileId} - type - ContentFile {ContentFileId} - path - "content.xml"
-
Than we add such set of triples, to note that {SceneId} has specific {xmlId} as reference:
RDF: {ContentFileId} - hasPart - {SceneId} {SceneId} - type - Element {SceneId} - idref - {xmlId}
-
Ensure that manifest.rdf contains such set of triples:
-
Profit! Now with simple SPARQL queries to Soprano we can
obtain all needed information as RDF triples. Convenient
function to update information contained in KoRdfBasicSemanticItem
class (see
KoRdfBasicSemanticItem::updateTriple()
).
Now from example you can see, that we can store arbitary type of the data, and place links to this data from any contents element that support xml:id property.
From now and all along the code I have been using "section" word instead of "scene", because "section" seems more neutral and denotes that stored data is linked to a <text:section> element.
Implementation foreword
At first, I(deniskup) have changed some behavior of default RDF implementation of Calligra Words: we have KoRdfSemanticItem before, that has functionality of handling "in-document text representation". As we don't need such thing for our metadata (section metadata, for example), I have moved all base functions of KoRdfSemanticItem to KoRdfBasicSemanticItem. So:
Old KoRdfSemanticItem == KoRdfBasicSemanticItem (base functions) + new KoRdfSemanticItem (additional handling that we don't need for every metadata)
Author specific classes for RDF handling
CAuMetaDataManager were introduced to add a author.rdf file to package (.odt) and create RDF contexts for writing needed RDF info to this file. Also it registers Author RDF elements within a system and has some helper functions.
For easy creation of new semantic items for Author I created a CAuSemanticItemBase class. Most of the elements will have common code base to update values of different types in RDF and generate queries to Soprano. All of this were extracted to this base class. When subclassing you only need to specify a list of integer and string properties, and base class will handle updates and other stuff automatically. I think elements factories can also be base-classed this way.
Sample of integration of section info to document on real document contents
Lets look at the example of how all of this looks on XML level. Download this file: Media:author-rdf-sample.odt. Lets unpack it:
author-rdf-sample.odt -> {unpacking} META-INF <DIR> Thumbnails <DIR> author.rdf <--- this is where Author stores its data content.xml <--- this is where document contents are stored manifest.rdf <--- this is where aliases from content.xml to author.rdf are placed meta.xml mimetype setting.xml styles.xml
If you will open author-sample.odt you will see header and two sections with assigned in Outliner data (badge, status, synopsis). This how it looks in contents.xml:
<?xml version="1.0" encoding="UTF-8"?> ... <office:body> <office:text> ... <text:section text:name="New section 1" xml:id="id-9fa48e52-48e1-49cc-83ae-cf0a55c79759"> <text:p text:style-name="P2"> Section 1 text </text:p> </text:section> ... </office:text> </office:body>
Here we see section with name "New section 1" and with specified id, remember this id we will see it later.
This is manifest.rdf contents:
Remember |
---|
Replace code below, when implementation will correspond to the newest specs. Example below is outdated and doesn't go along with OpenDocument v1.2 specs. See this for details. |
<?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> ... <rdf:Description rdf:about="%7Bc18429f8-ce63-4af8-a989-0fff163579f9%7D"> <ns2:idref xmlns:ns2="http://docs.oasis-open.org/ns/office/1.2/meta/pkg#"> id-9fa48e52-48e1-49cc-83ae-cf0a55c79759 </ns2:idref> </rdf:Description> ... <ns27:MetaDataFile xmlns:ns27="http://docs.oasis-open.org/ns/office/1.2/meta/odf#"> <ns28:path xmlns:ns28="http://docs.oasis-open.org/ns/office/1.2/meta/pkg#" rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> author.rdf </ns28:path> </ns27:MetaDataFile> ... </rdf:RDF>
First block says: metadata with id from rdf:about is associated with specified idref (remember xml:id). Second block says: that in package we have additional metadata file - author.rdf, that contains:
<?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> ... <ns6:Section xmlns:ns6="http://www.calligra.org/author/" rdf:about="%7Bc18429f8-ce63-4af8-a989-0fff163579f9%7D"> <ns7:badge xmlns:ns7="http://www.calligra.org/author/Section#" rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> Sec1Badge </ns7:badge> <ns8:magicid xmlns:ns8="http://www.calligra.org/author/Section#" rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> {c18429f8-ce63-4af8-a989-0fff163579f9} </ns8:magicid> <ns9:status xmlns:ns9="http://www.calligra.org/author/Section#" rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> 1 </ns9:status> <ns10:synop xmlns:ns10="http://www.calligra.org/author/Section#" rdf:datatype="http://www.w3.org/2001/XMLSchema#string"> Sec1Synop </ns10:synop> </ns6:Section> ... </rdf:RDF>
Here we see Section object, that describes (see rdf:about -> idref -> xml:id) our section. It has badge, status and synopsis info written in it.
Lets draw how all of this looks in RDF triples representation (simplified version):
SUBJECT | PREDICATE | OBJECT |
---|---|---|
%7Bc18429f8-ce63-4af8-a989-0fff163579f9%7D | http://www.calligra.org/author/Section#badge | Sec1Badge |
%7Bc18429f8-ce63-4af8-a989-0fff163579f9%7D | http://www.calligra.org/author/Section#status | 1 |
%7Bc18429f8-ce63-4af8-a989-0fff163579f9%7D | http://www.calligra.org/author/Section#synop | Sec1Synop |
%7Bc18429f8-ce63-4af8-a989-0fff163579f9%7D | http://docs.oasis-open.org/ns/office/1.2/meta/pkg#idref | id-9fa48e52-48e1-49cc-83ae-cf0a55c79759 |
RDF context (were mentioned above) is a root element of author.rdf file:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> ... </rdf:RDF>
Materials about Calligra and RDF
- http://www.slideshare.net/jza/an-rdf-metadata-model-for-opendocument-format-12
- OpenDocument Format for Office Applications (OpenDocument) Version 1.2
- Part 1: OpenDocument Schema: See contents: 4
- Part 3: Packages: See contents: 3.6, 6