Materials to read about RDF in general

Replace this with a link to a Calligra + RDF documentation, when it will be ready, and move links below to it.


Why we are using RDF to store Author's data? During design phase of Outliner, there was a goal to save it's data along with usual OpenDocument-formatted file, so we aren't planning to introduce some new file format, moreover task was to make Author's document be editable with usual text-writers as OpenOffice Writer, etc. We can add additional "Author-data-only" files to a package, but such behavior in bound of OpenDocument Standard can only be treated as attaching file to a document, that distort meaning of "attached file". Another disadvantage is that, when we modify file outside Author, this file will become outdated. At that point RDF seems to be an ideal function to implement storing of various data along with a document, and furthermore RDF allows to place in-text linkage to this data. And this data can be contained in a separate file of package.

OpenDocument allows us to mark specific elements in document contents as a related to some RDF subject. So, for example, if we want to mark that actor Vasya is participating some scene, we will make such sequence of actions to store this info with RDF:

Prefixes in predicates and objects are omited below in purpose of simplicity

  1. Create Vasya subject of type Actor:
    RDF: (subject - predicate - object)
         {ActorId} - type - Actor
         {ActorId} - name - Vasya
  2. Create scene subject:
    RDF: {SceneID} - type - Scene
    {ActorId} and {SceneId} is URIs, as RDF states, or some unique identification strings
  3. Store "Vasya is participating this scene" info:
    RDF: {SceneID} - hasActor - {ActorId}
    At this point all information on Author data level is complete, then we produce with text-linkage of scene
  4. Assign a <text:section> element corresponding to a scene unique xml:id.
        <text:section xml:id="{xmlId}"> {scene contents here} </text:section>
    1. Ensure that manifest.rdf contains such set of triples:
          RDF: {PackageId} - type - Package
      	 {PackageId} - hasPart - {ContentFileId}
      	 {ContentFileId} - type - ContentFile
      	 {ContentFileId} - path - "content.xml"
      Package, hasPart names are described in OpenDocument v1.2 specs part 3, chapter 6
    2. Than we add such set of triples, to note that {SceneId} has specific {xmlId} as reference:
          RDF: {ContentFileId} - hasPart - {SceneId}
      	 {SceneId} - type - Element
      	 {SceneId} - idref - {xmlId}
      Element class and idref property are described in OpenDocument v1.2 specs part 3, chapter 6.
      Steps 4 and 5 are implemented with help of KoTextInlineRdf class.
  5. Profit! Now with simple SPARQL queries to Soprano we can obtain all needed information as RDF triples. Convenient function to update information contained in KoRdfBasicSemanticItem class (see KoRdfBasicSemanticItem::updateTriple()).

Now from example you can see, that we can store arbitary type of the data, and place links to this data from any contents element that support xml:id property.

From now and all along the code I have been using "section" word instead of "scene", because "section" seems more neutral and denotes that stored data is linked to a <text:section> element.

Implementation foreword

At first, I(deniskup) have changed some behavior of default RDF implementation of Calligra Words: we have KoRdfSemanticItem before, that has functionality of handling "in-document text representation". As we don't need such thing for our metadata (section metadata, for example), I have moved all base functions of KoRdfSemanticItem to KoRdfBasicSemanticItem. So:

Old KoRdfSemanticItem ==
    KoRdfBasicSemanticItem (base functions)
  + new KoRdfSemanticItem (additional handling that we don't need for every metadata)
"in-document text representation" allows to modify text in document based on data of the object text is linking. For example, you can mention an actor Vasya in text, then you can change his name to John in some "Actor information editor", and "in-document text representation" functionality will change Vasya to John everywhere he is mentioned in text.

Author specific classes for RDF handling

CAuMetaDataManager were introduced to add a author.rdf file to package (.odt) and create RDF contexts for writing needed RDF info to this file. Also it registers Author RDF elements within a system and has some helper functions.

For easy creation of new semantic items for Author I created a CAuSemanticItemBase class. Most of the elements will have common code base to update values of different types in RDF and generate queries to Soprano. All of this were extracted to this base class. When subclassing you only need to specify a list of integer and string properties, and base class will handle updates and other stuff automatically. I think elements factories can also be base-classed this way.

Sample of integration of section info to document on real document contents

Lets look at the example of how all of this looks on XML level. Download this file: Media:author-rdf-sample.odt. Lets unpack it:

author-rdf-sample.odt -> {unpacking}
    Thumbnails <DIR>
    author.rdf <--- this is where Author stores its data
    content.xml <--- this is where document contents are stored
    manifest.rdf <--- this is where aliases from content.xml to author.rdf are placed

If you will open author-sample.odt you will see header and two sections with assigned in Outliner data (badge, status, synopsis). This how it looks in contents.xml:

 1 <?xml version="1.0" encoding="UTF-8"?>
 2 ...
 3 <office:body> <office:text>
 4     ...
 5     <text:section
 6 	text:name="New section 1"
 7 	xml:id="id-9fa48e52-48e1-49cc-83ae-cf0a55c79759">
 8 	<text:p text:style-name="P2">
 9 	    Section 1 text
10 	</text:p>
11     </text:section>
12     ...
13 </office:text> </office:body>

Here we see section with name "New section 1" and with specified id, remember this id we will see it later.

This is manifest.rdf contents:

Replace code below, when implementation will correspond to the newest specs. Example below is outdated and doesn't go along with OpenDocument v1.2 specs. See this for details.
 1 <?xml version="1.0" encoding="utf-8"?>
 2 <rdf:RDF xmlns:rdf="">
 3     ...
 4     <rdf:Description rdf:about="%7Bc18429f8-ce63-4af8-a989-0fff163579f9%7D">
 5 	<ns2:idref xmlns:ns2="">
 6 	    id-9fa48e52-48e1-49cc-83ae-cf0a55c79759
 7 	</ns2:idref>
 8     </rdf:Description>
 9     ...
10     <ns27:MetaDataFile xmlns:ns27="">
11 	<ns28:path
12 	    xmlns:ns28=""
13 	    rdf:datatype="">
14 	    author.rdf
15 	</ns28:path>
16     </ns27:MetaDataFile>
17     ...
18 </rdf:RDF>

First block says: metadata with id from rdf:about is associated with specified idref (remember xml:id). Second block says: that in package we have additional metadata file - author.rdf, that contains:

 1 <?xml version="1.0" encoding="utf-8"?>
 2 <rdf:RDF xmlns:rdf="">
 3     ...
 4     <ns6:Section
 5         xmlns:ns6=""
 6         rdf:about="%7Bc18429f8-ce63-4af8-a989-0fff163579f9%7D">
 7 	<ns7:badge xmlns:ns7=""
 8 	    rdf:datatype="">
 9 	    Sec1Badge
10 	</ns7:badge>
11 	<ns8:magicid xmlns:ns8=""
12 	    rdf:datatype="">
13 	    {c18429f8-ce63-4af8-a989-0fff163579f9}
14 	</ns8:magicid>
15 	<ns9:status xmlns:ns9=""
16 	    rdf:datatype="">
17 	    1
18 	</ns9:status>
19 	<ns10:synop xmlns:ns10=""
20 	    rdf:datatype="">
21 	    Sec1Synop
22 	</ns10:synop>
23     </ns6:Section>
24     ...
25 </rdf:RDF>

Here we see Section object, that describes (see rdf:about -> idref -> xml:id) our section. It has badge, status and synopsis info written in it.

Lets draw how all of this looks in RDF triples representation (simplified version):

%7Bc18429f8-ce63-4af8-a989-0fff163579f9%7D Sec1Badge
%7Bc18429f8-ce63-4af8-a989-0fff163579f9%7D 1
%7Bc18429f8-ce63-4af8-a989-0fff163579f9%7D Sec1Synop
%7Bc18429f8-ce63-4af8-a989-0fff163579f9%7D id-9fa48e52-48e1-49cc-83ae-cf0a55c79759

RDF context (were mentioned above) is a root element of author.rdf file:

1 <rdf:RDF xmlns:rdf="">
2     ...
3 </rdf:RDF>

Materials about Calligra and RDF

This page was last edited on 20 March 2015, at 20:26. Content is available under Creative Commons License SA 4.0 unless otherwise noted.