Difference between revisions of "KDE PIM/Akonadi/Architecture"

Jump to: navigation, search
(Search)
(Reword Payload types; add Foreign Payload description)
 
(3 intermediate revisions by 2 users not shown)
Line 1: Line 1:
 
= Akonadi Concepts and Architecture =
 
= Akonadi Concepts and Architecture =
 
 
This document describes and explains the core elements within Akonadi (like ''Items'', ''Collections'', etc.) as well as the architecture of the entire solution (clients, agents, server, etc.) and how they interact with each other. The reason this is all explained in a single document is so that it's easier to see how all the dots connect.
 
This document describes and explains the core elements within Akonadi (like ''Items'', ''Collections'', etc.) as well as the architecture of the entire solution (clients, agents, server, etc.) and how they interact with each other. The reason this is all explained in a single document is so that it's easier to see how all the dots connect.
  
 
Eventually, this should be moved or copied into Akonadi docs.
 
Eventually, this should be moved or copied into Akonadi docs.
  
 +
Most of the following is meant to be a rough specification. '''Implementation details''' of a concept that their users should not rely upon are clearly marked.
  
  
== Basic Entities ==
 
  
The term ''Entity'' is often used as a common terms for all the elements described below.
+
== Entities ==
 +
The term ''Entity'' is often used as a common term for all the elements described below.
  
 
=== Attributes ===
 
=== Attributes ===
 +
''Attributes'' are additional metadata that can be attached to other ''Entities'' (except for other ''Attributes''). An ''Attribute'' has a type and a value. ''Client applications'' and ''Agents'' can define their own ''Attributes'', but there are also some pre-defined ''Attributes''.
  
''Attributes'' are additional metadata that can be attached to other ''Entities'' (except for other ''Attributes''). An ''Atribute'' has a type and a value. ''Client applications'' and ''Agents'' can define their own ''Attributes'' but there are also some pre-defined ''Attributes'', like the "EntityDisplay" ''Attribute'' which allows customizing how an ''Entity'' is presented to user in clients (by setting custom display name, icon, background color etc.).
+
'''Example:''' The pre-defined "EntityDisplay" ''Attribute'' allows customizing how an ''Entity'' is presented to the user in a client (by setting custom display name, icon, background color etc.).
  
 
=== Items ===
 
=== Items ===
 +
An ''Item'' is an abstract representation of data. ''Items'' have metadata (ID, size, MIME type, etc.), ''Payload'' parts (the actual data) and attributes. Each ''Item'' has exactly one parent ''Collection''.
  
''Item'' is an abstract representation of data. ''Items'' have metadata (ID, size, mimetype, etc.), payload parts (the actual data, e.g. email envelope, email head and email body) and attributes. An ''Item'' can represent an email, a contact, a calendar event etc. One ''Item'' has exactly one parent ''Collection''.
+
'''Example:''' An ''Item'' can represent an email. Such an email ''Item'' may have envelope, head and one or more body ''Payload'' parts. An ''Item'' can also represent a contact, a calendar event etc.
  
 
=== Collections ===
 
=== Collections ===
 +
A ''Collection'', as the name suggests, is a collection of ''Items''. A ''Collection'' can also have child ''Collections'', thus creating a ''Collection tree''. ''Collections'' can also have attributes. Each ''Collection'' is owned by a ''Resource'' (see below). Finally, a list of MIME types is associated with each ''Collection''; every ''Item'' in a collection must be of one of the associated MIME types.
  
''Collection'', as the name suggests, is a collection of ''Items''. A ''Collection'' can also have child
+
'''Example:''' Email folders are collections of ''Items'' of "email" MIME type. Calendars are collections of ''Items'' of "Todo" or "Event" MIME type.
''Collections'', thus creating a ''Collection tree''. ''Collections'' can also have attributes.
 
  
 
=== Virtual Collections ===
 
=== Virtual Collections ===
 +
A ''Virtual Collection'' is a ''Collection'' that cannot own ''Items'' or have non-virtual subcollections. Instead of being a parent of ''Items'', ''Items'' are ''linked'' into ''Virtual Collections''. One ''Item'' can be linked into multiple ''Virtual Collections''.
  
''Virtual Collection'' is represented as a regular ''Collection'', but it has a special property that it cannot own any ''Items'' nor it can have any subcollections unless they are virtual as well. Instead of being a parent of ''Items'', ''Items'' are ''linked'' into ''Virtual Collections''. One ''Item'' can be linked into multiple ''Virtual Collections''.  
+
'''Example:''' ''Virtual Collections'' are typically used to hold search results, i.e., a ''Virtual Collection'' represents a search query and all ''Items'' linked to it are those that match the query.
  
''Virtual Collections'' are typically used to hold search results, that is a ''Virtual Collection'' represents a search query and all ``Items`` linked to it are those that match the query.
+
'''Implementation detail:''' ''Virtual Collections'' are represented as regular ''Collections''.
  
 
=== Tags ===
 
=== Tags ===
''Tag'' describes a common abstract relation between multiple ''Items''. For example, a "Work" tag can be assigned to many emails, tasks and events (or rather ''Items'' representing those) that are somehow related to user's work. A single ''Item'' can have multiple ''Tags'' and a single ''Tag'' can be assigned to multiple ''Items''.
+
A "Tag" is a unary relation on "Items", and can thus be seen as an "Item" property. A single ''Item'' can have multiple ''Tags'' and a single ''Tag'' can be assigned to multiple ''Items''.
 +
 
 +
'''Example:''' a "Work" tag can be assigned to many emails, tasks and events (or rather ''Items'' representing those) that are somehow related to the user's work.
  
 
=== Relation ===
 
=== Relation ===
 +
A ''Relation'' is a binary relation on "Items", i.e., it describes a specific relation between '''exactly two''' ''Items''. A single ''Item'' can be in multiple ''Relations'', even in multiple ''Relations'' of the same type.
  
''Relation'' describes a specific relation between exactly two ''Items'' - for example we can have an "INVITATION" ''Relation'' between an ''Item'' that represents an email with meeting invitation and an ''Item'' that represents a calendar event that was created from the invitation email. Single ''Item'' can be in multiple ''Relations'', even in multiple ''Relations'' of the same type, but there are always exactly two ''Items'' in each ''Relation''.
+
'''Example:''' We can have an "INVITATION" ''Relation'' between an ''Item'' that represents an email with meeting invitation and an ''Item'' that represents a calendar event that was created from this invitation email. If the event has multiple participants and an invitation email was generated for each participant, those emails would all be in "INVITATION" '''Relation''' to the event.
  
  
Line 42: Line 48:
  
 
=== Server ===
 
=== Server ===
''Server'' is the server process that other components talk to via the Akonadi ''Protocol''. It manages the cached ''Entities'' and persists them in a database. Database is considered an implementation detail of the ''Server'', no-one else knows about it or interacts with it.
+
''Server'' refers to the server process that other components talk to via the Akonadi ''Protocol''. It manages the cached ''Entities'' and persists them.
 +
 
 +
'''Implementation detail:''' The ''Server'' uses a SQL database to persist the cached ''Entity''.
  
 
=== Agents ===
 
=== Agents ===
''Agents'' are single-purpose processes that get notified when an ''Entity'' is created, modified or removed from the ''Server''. Example can be the MailFilterAgent which is notified whenever a new ''Item'' is created and if the ''Item'' holds an email, it will apply a local mail filters to it and store the change back in Akonadi.  
+
''Agents'' are single-purpose processes that get notified when an ''Entity'' is created, modified or removed from the ''Server''.
 +
 
 +
'''Example:''' A MailFilterAgent which is notified whenever a new ''Item'' is created; if the ''Item'' holds an email, it will apply local mail filters to it and store the change back in Akonadi.  
  
 
=== Resources ===
 
=== Resources ===
''Resources'' are special cases of ''Agents'' that synchronize data between ''Akonadi Server'' and a remote server - for example the ''IMAP resource'' synchronizes data between ''Akonadi Server'' and a chosen IMAP server. To have multiple IMAP accounts, a multiple instances of the ''IMAP resource'' are created. When talking about ''Resources'' and ''Agents'' we can talk either about ''Agent'' (or ''Resource'') ''Type'' or ''Agent'' (or ''Resource'') ''Instance''. ''Type'' is the implementation of the ''Resource'' and ''Instance'' is a running instance of the ''Type''. ''Types' are unique (e.g. there can only be a single Resource called ''IMAPResource'', but there could be multiple ''Instances'' of the ''Type'', i.e. multiple ''IMAPResource'' ''Resources'' running providing connection to different IMAP servers or accounts.
+
''Resources'' are special cases of ''Agents'' that synchronize data between the ''Server'' and a remote server.
  
=== Clients ===
+
'''Example:''' An "IMAP resource" synchronizes data between the ''Server'' and some IMAP server.
''Clients'' are user-facing application like ''KMail'' or ''KOrganizer'' that presents data from Akonadi to users and allows them to interact with the data.
+
 
 +
=== Agent Types and Agent Instances ===
 +
An ''Agent Type'' is a named ''Agent'' implementation. We cannot have two ''Agent Types'' with the same name. An ''Agent Instance'' is a running instance of some ''Agent Type''. Each ''Agent Type'' can have multiple ''Agent Instances''.
  
 +
The same terminology applies to ''Resources'': There are ''Resource Types'' with a unique name, and each can have multiple ''Resource Instances''.
  
== DB Tables ==
+
'''Example:''' To manage emails on multiple IMAP servers, we can create multiple ''Resource Instances'' of type ''IMAP resource''. The name of that type might be "IMAPResource", and there cannot be another ''Resource Type'' with the same name.
  
This is a brief description of tables in the database that the ''Server'' stores all the data in and how they relate to the ''Entities'' and components described above.
+
=== Clients ===
 +
''Clients'' are user-facing applications that presents data from Akonadi to users and allows them to interact with the data.
  
=== SchemaVersion ===
+
'''Example:''' ''KMail'' is a ''Client'' that presents email ''Items'' to the user, lets them create new email (as ''Items'') or folders (as ''Collections''), etc. ''KOrganizer'' works with calendar data instead.
A standalone table that holds information about the current version of the schema. Nothing to get excited about.
 
  
=== ResourceTable ===
 
Holds list of active ''Agent'' and ''Resources'' ''Instances''.
 
  
=== PimItemTable ===
 
Holds metadata about Items - ID, parent ''Collection'', size etc. This is a very big table - one row per every email, contact, event etc.
 
  
=== PartTable ===
+
== Some more concepts ==
''PartTable'' holds the actual ''payload parts'' and ''attributes'' for ''Items''. This is the largest table
 
in Akonadi as it contains on average 3 rows per each row in ''PimItemTable''.
 
  
=== PartTypeTable ===
+
=== ID ===
Contains names of parts and attributes from ''PartTable'' (like PLD:ENVELOPE, PLD:HEAD, ATR:noselect, etc.) - this is a very small table (around 10 rows normally) and its purpose is purely to de-duplicate the often-repeated strings from the already-big ''PartTable''.
+
''Item ID'', ''Collection ID'', and ''Tag ID'' are database primary keys, but are exposed to clients to uniquely identify each ''Entity''.
 +
 
 +
=== Remote ID ===
 +
''RemoteID'' is a string-based identifier that is used by the backend (IMAP server, CalDAV server etc.) to identify the ''Entity''. This is only exposed to ''Resources'', since those are the only ones to actually understand what the ''Remote ID'' means.
  
=== MimeTypeTable ===
+
'''Example:''' The ''Remote ID'' can be the UID of an email ''Item'' or the mailbox name for a ''Collection'' on an IMAP ''Resource'', or the name of an email file on a Maildir ''Resource''.
''MimeTypeTable'' holds list of mime types. This is a very small table and like ''PartTypeTable'' is used simply to de-duplicate repeated strings from the ''PimItemTable'' and to allow a many-to-many relation between ''Collections'' and mimetypes.
 
  
=== FlagTable ===
+
=== GID ===
''FlagTable'' holds Item flags, like "seen", "spam", "hasattachment" etc. The
+
''GID'' is a string-based identifier extracted from the payload and is exposed to clients.
table only holds simple strings and is fairly small (we have around 20 flags).
 
  
=== PimItemFlagRelation ===
+
'''Example:''' The Message-ID header of an email or the UID of an iCal event are typical ''GID''s of their corresponding ''Items''.
A single Item can have 0-N flags and this table describes the relation. This is a fairly big table as it usually has more than one flag per each ''PimItem'' row.
 
  
=== CollectionTable ===
+
=== Payload Type ===
The ''CollectionTable'' holds Collections - their ID, parent ''Collection'', cache policy
+
Each ''Payload'' part of an ''Item'' has one of three ''Payload'' types:
etc. This normally a small-ish table - one row per a mail folder, calendar,
 
addressbook  etc. Each ''Collection'' is owned by a Resource.
 
  
=== CollectionMimeTypeRelation ===
+
'''Internal Payloads''' are stored directly inside Akonadi's database. These are used if the ''Payload'' is sufficiently small (4kB by default).
As a single ''Collection'' can have multiple mimetypes (those are actually mimetypes of ''Items'' that are permitted within this ''Collection'') and this table describes the relation between ''CollectionTable'' and ''MimeTypeTable''.
 
  
=== CollectionAttributeTable ===
+
'''External Payloads''' are used for larger payloads. The actual payload is stored in a separate cache file (inside file_db_data), and then only name of that file is stored in Akonadi's database.
This table holds additional attributes for ''Collections''. One ''Collection'' can have multiple ''Attributes'', but an attribute belongs to exactly one ''Collection''.
 
  
=== CollectionPimItemRelation ===
+
Finally, '''Foreign Payloads''' can be used for ''Resources'' where the backend is not remote, but presents local files to Akonadi instead. The database holds the absolute filepath of the local file that holds an ''Item's'' part. Note, however, that ''Foreign Payloads'' are not used by anyone as of now (2018-06).
This table describes relation between ''Items'' and ''Virtual Collections''. This does not describe parent-child relationship, that's in ''PimItemTable.collectionId''. The size of this table varies depending on how much you use the "Search" feature in KMail.
 
  
=== TagTable ===
+
=== Cache ===
''TagTable'' holds ''Tags''. Usually a small table, one row per ''Tag'' and people generally don't have more then a few dozen ''Tags'' (most people don't use this feature at all).
+
Akonadi is a cache, not a storage. New ''Items'' are downloaded from the backend services (IMAP server, CalDAV server, maildir, ...) by ''Resources'' and uploaded to the ''Server'' reguarily. Any changes done to ''Entities '' by ''Clients'' (marking an email as read, creating a new event, deleting a contact etc.) are send to the respective ''Resource'' that owns the ''Item'' in question, and the ''Resource'' replays the change to the remote service. If the remote service is not available (let's say user is offline, but they mark a bunch of email as read or move them to some other folder) the changes are recorded by the ''Resource'' and are replayed once network is available.
  
=== TagTypeTable ===
 
This table olds tag types - this is purely to de-duplicate common strings from ''TabTable''.
 
  
=== TagAttributeTable ===
 
A table equivalent to ''CollectionAttributeTable'', but for ''Tags''.
 
  
=== TagRemoteIdResourceRelation ===
+
= Architecture Overview =
A single ''Tag'' can exist in various backends - for example an IMAP account can have a tag called "KDE" that user uses to tag all emails related to KDE with. A calendar account can also have a "KDE" tag that user can use to tag KDE-related events with. To user we want to represent these two tags as a single ''Tag'', so that they can see everything tagged with "KDE" ''Tag'' regardless of whether it's an email or an event. However each backend identifies the ''Tag'' differently - the IMAP resource will identify the ''Tag'' as "$KDE" while the CalDAV resource will identify the ''Tag'' with some random  UUID like "{abcde-ef012-3456}". This table holds a RemoteID for each ''Tag'' as seen by each ''Resource'' that has the ''Tag''.
 
  
=== PimItemTagRelation ===
+
'''TODO:''' This mix of architecture specification and implementation details might not be ideal.
A single ''Item'' can have multiple ''Tags'' and this table describes the relation.
 
  
=== RelationTable ===
+
== Protocol ==
Holds ''Relations'' between two ''Items''
+
All components communicate with each other via the ''Protocol''. The ''Protocol'' is a custom binary protocol with commands and responses. Each ''Client'' opens one or more connections (called ''Session'', ''Command Session'' or ''Command Bus'') to the ''Server'' and can send commands to the server requesting or modifying data.
  
=== RelationTypeTable ===
+
Each ''Client'' can create multiple ''Sessions''. This is useful because Sessions don't support command pipelining, meaning that the next command in the queue is not sent to the server until a response to the previous command has arrived, which can cause undesirable waits for the user.
An equivalent to ''TagTypeTable'', but for ''Relations''.
 
  
 +
Note that the ''Protocol'' is an implementation detail. It is not exposed to the ''Clients'', who only interact with the ''Server'' via the Client API, which internally issues and handles the communication via the ''Protocol''.
  
 +
'''Example:''' In KMail, the message list and the message viewer each have their own ''Session''. This way, when the user opens a huge folder, they can click on the first email immediately and the message viewer can retrieve it through its own ''Session'' without having to wait for the message list to receive all emails from Akonadi first.
  
== Some more concepts ==
 
  
=== ID ===
+
== Akonadi Control ==
''Item ID'', ''Collection ID'', ''Tag ID'' is a database primary key but is exposed to clients to uniquely identify each ''Entity''.
+
''akonadi_control'' is a small, but very important part of Akonadi. When you type "akonadictl start", or when you start an Akonadi-enabled application like Kontact, they will start the ''akonadi_control'' process. Akonadi Control is responsible for starting the Akonadi server and all configured Resources. It also automatically restarts them when any of them crashes. The second important role of Akonadi Control is that it provides a DBus interface to communicate with the Akonadi Resources.
  
=== Remote ID ===
+
== Akonadi Server ==
''RemoteID'' is a string-based identifier that is used by the backend (IMAP server, CalDAV server etc.) to identify the ''Entity''. On IMAP server this can be an IMAP UID for an ''Item'', mailbox name for a ''Collection'', for maildir this can be a filename of the email etc. This is only exposed to ''Resources'', since those are the only ones to actually understand what the RID means.
+
Akonadi Server is the implementation of the ''Server'' concept described above. In principle, the Akonadi Server is very simple: it receives commands from clients, handles them by reading or writing to the database, sends back a response and generates a ''Change Notification'' if needed.
  
=== GID ===
+
Each connection is handled in a separate thread on the server in a Connection object. This allows the implementation of the command handlers to be blocking and also allows to keep some context for each connection. Whenever a new command is received on a connection, it inspects which type of command it is and creates a respective Handler (e.g. StoreHandler, AppendHandler, MoveHandler, etc.)
''GID'' is a string-based identifier extracted from the payload (Message-ID header in emails, UID in iCal events etc.) and is exposed to clients.
 
  
=== Payload Type ===
+
For read ("Fetch") commands, the respective Handler will construct an SQL query to retrieve the requested Entities from the database. It will them serialize them into the Protocol and send them back to the client. In some cases, the Akonadi Server can first request that the Item payloads are first retrieved from the owning Resource - this is because Collections can have an expiration policy, meaning that after some timeout, Akonadi Server will delete the payload of the Items in that Collection from the database. When a client requests a payload of an Item that is missing the payload, the Akonadi Server will request the Resource that owns the Item (via a DBus call) to retrieve the payload and upload it to Akonadi using the standard Job mechanism. Once done, the Resource notifies the Server via DBus call again that it has finished and the Akonadi server continues with retrieval as usual.
As described above the actual ''Item'' data (e.g. body
 
of an email) as stored in ''PartTable'' as a BLOB in the ''data'' column. This is called the ''Internal Payload''. To avoid storing massive BLOBs in the database, we store payloads larger than certain threshold (4kB by default) as files on the filesystem and the ''PartTable'' only refers to the filename on the filesystem. Those are called ''External Payloads''. There are also ''Foreign Payloads'' but right now they are not actually used by anyone.
 
  
=== Cache ===
+
For each write command, once the data are written to the database, the Server will generate a change notification describing what and how was changed.
Akonadi is a cache, not a storage. New ''Items'' are downloaded from the backend services (IMAP server, CalDAV server, maildir, ...) by ''Resources'' and uploaded into Akonadi reguarily. Any changes done to ''Entities '' by clients (marking an email as read, creating a new event, deleting a contact etc.) are send to the respective ''Resource'' that owns the ''Item'' in question, and the ''Resource'' replays the change to the remote service. If the remote service is not available (let's say user is offline but they mark a bunch of email as read or move them to some other folder) the changes are recorded by the ''Resource'' and are replayed once network is available.
 
  
 +
The Server also provides search functionality in part, which is described in detail below.
  
  
== How the whole thing works together ==
+
== Search ==
 +
There is a special agent called Akonadi Indexing Agent which listens to changes in ''Items'' and indexes the ''Items'' into a Xapian database. There is a separate database for emails, contacts, event, notes, and contacts parsed from emails (like senders, etc.).
  
=== Protocol ===
+
When a client wants to perform a search, they can either query the Xapian database directly in-process through a search query, which will return a list of ''Item'' IDs matching the query. The client can then retrieve the respective Items from Akonadi.
All components communicate with each other via ''The Protocol''. The protocol is a custom binary protocol with commands and responses. Each client opens a connection (called Session) to the Akonadi Server and can send commands to the server requesting or modifying data. The Session is also called Command Session or Command Bus.
 
  
Each client can open multiple Sessions with the server - this is useful because Sessions don't support command pipelining, meaning that next command in the queue is not sent to the server until a response to the previous command has arrived, which can cause undesirable waits for the user. For example in KMail, the message list and the message viewer have each their own Session. This way when the user opens a huge folder, they can click on the first email immediately and the message viewer can retrieve it through its own Session without having to wait for the message list to receive all emails from Akonadi first.
+
A second option is a so-called persistent search. Persistent search is represented as a ''Virtual Collection'' belonging to the virtual Search ''Resource''. The ''Virtual Collection'' holds a search query, which is re-executed whenever an Item changes and all Items matching the query are linked to the ''Virtual Collection''. This allows keeping a persistent filter for ''Items'' even across multiple different ''Collections''.
  
Note that the Protocol is an implementation detail and is not in any way exposed to the clients. Clients only interact with the Jobs API, which internally issues and handles the communication via the protocol.
+
Search infrastructure is currently undergoing major overhaul codenamed "Make Indexing Great Again". See https://phabricator.kde.org/T7014 for details.
  
=== Change Notifications ===
 
As described above, the protocol is a mechanism for the clients to communicate with the Akonadi Server. The second communication mechanism in Akonadi are the Change Notifications, which allow the Server to notify clients about changes.
 
  
Clients can express their wish to receive notifications by creating a new instance of Monitor (or ChangeRecorder, more on that below). The Monitor will establish a connection to the server (we call this subscription, as the client subscribes to receive notifications) and will upload to it which kind of notifications it's interested in. The scope can be changed at any time through the Monitor's API, the Monitor will always upload the new scope to the server.
 
  
Whenever another client modifies an Entity on the server (create/modify/move/remove/link/...) the server will generate a notification message that describes which Entities have changed, and how. It will then compare the notification message to the scopes of the subscribers to see if the subscriber is interested in this particular notification or not. If the notification message matches the subscriber's filter (we call it that the subscriber "accepts" the notification), the notification is sent over to the subscriber.
+
= Client API =
 +
From an application point of view, the ''Server'' and the ''Protocol'' are just implementation details that they are not aware of and don't interact with directly in any way. The only means for ''Clients'' to interact with Akonadi is through the Client API.
  
On the client side, the notification is received by the Monitor and put into a pipeline. The Monitor will inspect which Entities the notification concerns and will retrieve them all from Akonadi using the regular Job API. Once all Entities are received, the Monitor will emit the respective signal based on the type of the notification.
+
''Clients'' communicate with the server either using the ''Jobs API'', the ''Notification API'', or a full-fledged ''Entity Tree Model''.
  
Under the hood, Change Notifications are using the Protocol as well, but each Monitor opens its own Notification connection (or Notification Bus) to the server in parallel to the Command Session.
+
== Jobs API ==
 +
''Jobs'' are the core elements of the Client API. A ''Job'' is an asynchronous task that can retrieve data from Akonadi or modify them. Once finished, the job emits the ''result()'' signal after which the ''Client'' can handle the result of the ''Job''.
  
=== Akonadi Control ===
+
A ''Job'' is associated to a ''Session'' on creation. If the caller does not specify a ''Session'', the default ''Session'' is used. Each ''Session'' has a ''Job'' queue, where new ''Jobs'' are automatically enqueued. The ''Jobs'' of each ''Session'' are never processed in parallel, but sequentially: A ''Job'' is only processed when every earlier ''Job'' is finished.
''akonadi_control'' is a small, but very important part of Akonadi. When you type "akonadictl start", or when you start an Akonadi-enabled application like Kontact, they will start the ''akonadi_control'' process. Akonadi Control is responsible for starting the Akonadi server and all configured Resources. It also automatically restarts them when any of them crashes. The second important role of Akonadi Control is that it provides a DBus interface to communicate with the Akonadi Resources.
 
  
=== Akonadi Server ===
+
'''Example:''' An ''ItemFetchJob'' retrieves ''Items'' from Akonadi; an ''CollectionModifyJob'' modifies ''Collections''.
In principle the Akonadi Server is very simple: it receives commands from clients, handles them by reading or writing to the database, sends back a response and if needed generates a Change Notification.
 
  
Each connection is handled in a separate thread on the server in a Connection object. This allows the implementation of the command handlers to be blocking and also allows to keep some context for each connection. Whenever a new command is received on a connection, it inspects which type of command it is and creates a respective Handler (e.g. StoreHandler, AppendHandler, MoveHandler, etc.)
+
== Notification API ==
 +
If a ''Client'' is interested in changes to ''Entities'' (create/modify/move/remove/link/...), it may subscribe to ''Change Notifications'' by creating a ''Monitor''. A ''Monitor'' has signals for each type of change that can occur, so ''Clients'' can connect only to those they are interested in. At any time, the ''Client'' may set the ''Monitor's'' scope, which specifies the kind of notification the ''Client'' is interested in, like the kind of affected ''Entities'', only a specific ''Entity'', the type of change, etc. For each ''Entity'' change that matches the scope, the ''Monitor'' issues a ''Change Notification'' that describes the change. The ''Change Notification'' contains a description of the change as well as the changed ''Item'', the latter being called ''Notification Payload''.
  
For read ("Fetch") commands, the respective Handler will construct an SQL query to retrieve the requested Entities from the database.  It will them serialize them into the Protocol and send them back to the client. In some cases, the Akonadi Server can first request that the Item payloads are first retrieved from the owning Resource - this is because Collections can have an expiration policy, meaning that after some timeout, Akonadi Server will delete the payload of the Items in that Collection from the database. When a client requests a payload of an Item that is missing the payload, the Akonadi Server will request the Resource that owns the Item (via a DBus call) to retrieve the payload and upload it to Akonadi using the standard Job mechanism. Once done, the Resource notifies the Server via DBus call again that it has finished and the Akonadi server continues with retrieval as usual.
+
A ''Change Recorder'' is a special ''Monitor'' that writes each reported change to a journal file. The user is responsible to call ''changeProcessed()'' whenever it handles a change upon which the ''Change Recorder'' will remove the notification from the journal and will dispatch the next notification in the queue. A ''Change Recorder'' is only very rarely needed by ''Clients''. Instead, they are used by some ''Resources'' to record local changes that have not yet been propagated to a remote server, e.g., if an ''Item'' of a ''Resource'' is modified while that ''Resource'' is offline. In this case, all notifications will be stored in the ''Change Recorder's'' journal and once the internet connection is available again, the Resource will request the ''Change Recorder'' to replay all notifications from the journal so that it can upload the changes to the backend.
  
For each write command, once the data are written to the database, the Server will generate a change notification describing what and how was changed.
+
'''Implementation detail:''' ''Monitors'' establish a connection, called subscription, to the ''Server'' upon creation, and the ''Server'' keeps track of the currently existing subscriptions and the scopes of the corresponding ''Monitors''. Whenever an ''Entity'' is modified on the ''Server'' by a ''Client'' or ''Agent'', the ''Server'' generates a notification message that describes which ''Entities'' have changed, and how. It will then compare the notification message to the scopes of the subscribers to see if the subscriber is interested in this particular notification. If the notification message matches a subscriber's scope, the notification message as well as the changed ''Entity'' (''Notification Payload'') is sent over to the subscriber.
  
The Server also provides search functionality in part, which is described in detail below.
+
On the client side, the notification message is received by the ''Monitor'' and put into a pipeline. The ''Monitor'' then issues a ''Change Notification'' containing a description of the change as well as the changed ''Entities'' themselves. This is done by emitting an appropriate signal that depends on the type of the notification.
  
=== Client API ===
+
'''TODO:''' Does a Monitor establish two distinct connections, one Notification Bus and a Command Bus? If so, which is used for what? I guess that the Command Bus is used to subscribe/unsubscribe and to set the scope (communication from Monitor to Server), and the Notification Bus is used for the notification messages (communication from Server to Monitor)?
From an application point of view, the Akonadi Server and The Protocol are just implementation details that they are not aware of and don't interact with directly in any way. The only means for clients to interact with Akonadi is through the client API.
 
  
==== Jobs ====
+
== Entity Tree Model (ETM) ==
Jobs are the core elements of the client API. A Job is a long-running asynchronous task that can retrieve data from Akonadi or modify them - e.g. ItemFetchJob retrieves Items from Akonadi, CollectionModifyJob modifies Collections etc. Once finished, the job emits the result() signal after which the client can handle the result of the Job.
+
An ''Entity Tree Model'' is a QAbstractItemModel which holds the entire tree of ''Collections'' and ''Items'' and keeps it up to date. It is possible to filter the content of an ETM in many ways to only include Entities of a certain type or MIME type. An ETM is usually used in applications in combination with various proxy models.
  
Jobs are dispatched automatically from the event loop and they are queued on a Session. If the caller does not specify any Session, the default Session is used. Each Session can only handle one Job at a time.
+
It is not possible to modify ''Entities'' by changing them in the ETM. Instead, the ''Client'' has to use a ''Job'' to modify an ''Entity''.
  
==== Monitor and ChangeRecorder ====
+
'''Example:''' In KMail, the folder list and message list are both sharing the same ETM under the hood, but use different proxy models to display only a specific part of the tree in each view.
The Monitor is used to listen for changes in Akonadi. It has signals for each type of change that can occur, so clients can connect only to those they are interested in. Monitors can also be customized to only listen to changes of a certain type or changes concerning a specific type of Entities or even only specific Entities.
 
  
ChangeRecorder is a subclass of Monitor. It does exactly the same job as Monitor but in addition to it, it saves each change notification it receives to a journal file. The user is responsible to call changeProcessed() whenever it handles a change upon which the ChangeRecorder will remove the notification from the journal and will dispatch next notification in the queue. ChangeRecorder is only very rarely needed by clients and is used almost exclusively by Agents and Resources, where it can record incoming changes in case the Resource cannot store them to the backend, for example, because it needs to be online to do that. In this case, all notifications will be stored in the ChangeRecorder's journal and once the internet connection is available again, the Resource will request the ChangeRecorder to replay all notifications from the journal so that it can upload the changes to the backend.
+
'''Implementation detail:''' An ETM automatically keeps itself up-to-date by using a ''Monitor''. Therefore, it will automatically (and asynchronously) reflect all changes made via the Jobs API by appropiately processing ''Change Notifications''.
  
==== EntityTreeModel (ETM) ====
 
EntityTreeModel is a QAbstractItemModel which holds the entire tree of Collections and Items and keeps it up to date. It is possible to filter the content of ETM in many ways to only include Entities of a certain type or mime type.  ETM is usually used in applications in combination with various proxy models. For example in KMail, the folder list and message list are both sharing the same ETM under the hood, but use various proxy models to display only a specific part of the tree in each view.
 
  
ETM automatically keeps itself up-to-date by using a Monitor. It is not possible to modify entities by changing them in the Model, instead the client should use a Job to modify the Entity and the model will be updated as it receives the notification from the Monitor.
 
  
=== Resources ===
+
= Resources =
As explained above, Akonadi Resources take care of synchronizing changes between Akonadi Server and the actual storage (IMAP server, CalDAV server, local iCal file, etc.) which I'll call backend for the purposes of this document.
+
As explained above, ''Resources'' take care of synchronizing changes between Akonadi and the ''Resource's'' backend (IMAP server, CalDAV server, local iCal file, etc.).
  
==== Scheduler ====
+
'''TODO:''' This section is interesting for ''Resource'' authors, but then it lacks a ''Resource API'' section. Part of that is currently described in ''Change Replay'' (the fact that all ''Resources'' implement ''AgentBase::Observer'').
Each Akonadi Resource has a ResourceScheduler, which holds a queue of tasks that the Resource should perform. In fact, the ResourceScheduler has several queues with various priorities. For example, a queue for tasks that write changes from Akonadi to the backend has the highest priority, while the queue to download new changes from the backend has a lower priority. Whenever the Resource receives a new task (more on tasks below) it puts it into the ResourceScheduler, which then dispatches it based on the priority by calling the respective method in the Resource implementation. Once the Resource is done with handling the task, it tells the ResourceScheduler that it's done and that it can schedule the next task.
 
  
==== Tasks ====
+
== Tasks ==
 
In general, there are two types of tasks: ChangeReplay and Sync. ChangeReplay tasks have the highest priority and they represent a change in the Akonadi data that needs to be written to the backend. This can be a new flag being added to an email, or a new calendar event being created or a contact being removed. Sync tasks (FetchItems, SyncCollectionTree, SyncCollection etc.) are tasks that are asking the Resource to download any new changes from the backend and put them into Akonadi. Sync tasks do not write anything to the backend.
 
In general, there are two types of tasks: ChangeReplay and Sync. ChangeReplay tasks have the highest priority and they represent a change in the Akonadi data that needs to be written to the backend. This can be a new flag being added to an email, or a new calendar event being created or a contact being removed. Sync tasks (FetchItems, SyncCollectionTree, SyncCollection etc.) are tasks that are asking the Resource to download any new changes from the backend and put them into Akonadi. Sync tasks do not write anything to the backend.
  
==== Change Replay ====
+
'''TODO:''' It should be possible to describe both ''Tasks'' without mentioning the ''Scheduler''. Also, make clear that Replay is Akonadi-to-backend propagation, and Sync is backend-to-Akonadi propagation.
Change Replay tasks are created whenever the ''ChangeRecorder'' in the Resource is notified by Akonadi Server about a change. ''Change Notifications'' are described in detail below. ChangeRecorder will store the change notification into a file and will pass it to the ''ResourceScheduler''. In case the Resource cannot handle the change, maybe because the Resource is offline and it needs network access to connect to the backend, the change remains stored in the ''ChangeRecorder'''s file until the Resource goes online again. When that happens the ''ChangeRecorder'' is asked to replay all the changes from the file by passing them to the ''ResourceScheduler''.
+
 
 +
=== Change Replay ===
 +
''Change Replay'' tasks are created whenever the ''Change Recorder'' of a ''Resource'' is notified about a change. The ''Change Recorder'' will store the ''Change Notification'' in a journal and will pass it to the ''Scheduler''. In case the ''Resource'' cannot handle the change immediately, maybe because the ''Resource'' is offline and it needs network access to connect to the backend, the change remains stored in the ''Change Recorder's'' journal until the ''Resource'' goes online again. When that happens, the ''Change Recorder'' is asked to replay all the changes from the journal by passing them to the ''Scheduler''.
  
Each Akonadi Resource implements an ''AgentBase::Observer'' interface. This interface has methods like ''itemAdded(item, parentCol)'', ''itemsRemoved(items)'', ''itemsFlagsChanged(items, addedFlags, removedFlags)'' etc. that must be implemented by the Resource. In those methods, the Resource implementation takes the changed data and writes it to the backend using the backend protocol/format (IMAP, CalDAV, ICal etc.). As the ''ResourceScheduler'' is replaying the tasks, depending on the type of the change it calls the respective method from the Observer interface, waits for the Resource to confirm that the change has been succesfully written to the backend and then schedules the next task and so on until it runs out of tasks in the queue or until another task arrives.
+
Each ''Resource'' implements the ''AgentBase::Observer'' interface. This interface has methods like ''itemAdded(item, parentCol)'', ''itemsRemoved(items)'', ''itemsFlagsChanged(items, addedFlags, removedFlags)'' etc. that must be implemented by the Resource. In those methods, the Resource implementation takes the changed data and writes it to the backend using the backend protocol/format (IMAP, CalDAV, ICal etc.). As the ''Scheduler'' is replaying the tasks, depending on the type of the change it calls the respective method from the Observer interface, waits for the ''Resource'' to confirm that the change has been succesfully written to the backend, and then schedules the next task and so on until it runs out of tasks in the queue or until another task arrives.
  
==== Sync ====
+
=== Synchronization ===
Resource Synchronization means retrieving data from the backend and storing them in Akonadi, so it's a uni-directional synchronization. Synchronization can be requested via Resource's DBus interface - either via SynchronizeCollectionTree task, SynchronizeCollection task (which synchronizes all Items within a specified Collection) or SynchronizeItem, which retrieves a specified Item. There's also SyncAll tasks, which schedules SynchronizeCollectionTree task followed by SynchronizeCollection task for each Collection. There are more tasks of course to sync attributes, tags etc. but they all work on the same principle.  
+
Synchronization means retrieving data from the backend and storing them in Akonadi, so it's a one-way synchronization. Synchronization can be requested via a ''Resource's'' DBus interface - either via SynchronizeCollectionTree task, SynchronizeCollection task (which synchronizes all Items within a specified Collection) or SynchronizeItem, which retrieves a specified Item. There's also SyncAll tasks, which schedules SynchronizeCollectionTree task followed by SynchronizeCollection task for each Collection. There are more tasks of course to sync attributes, tags etc. but they all work on the same principle.  
  
 
The Resource does not have to synchronize the entire Item, for example for emails we often only synchronize the envelope, which is enough to display the email in the message list in KMail, and the actual body is retrieved on demand once the user opens the email in KMail and KMail requests the payload body from Akonadi.
 
The Resource does not have to synchronize the entire Item, for example for emails we often only synchronize the envelope, which is enough to display the email in the message list in KMail, and the actual body is retrieved on demand once the user opens the email in KMail and KMail requests the payload body from Akonadi.
  
item Synchronization happens by so-called merging process, when the Server tries to see if an Item with the same identification already exists in the Akonadi database and if so, it overwrites it with the newly received Item. Otherwise a new Item entry is created in the database. The merging happens using RID (RemoteID) or optionally GID in cases where RID is unstable. It's also possible to combine both.
+
''Item'' Synchronization happens by the so-called merging process, when the ''Server'' tries to see if an ''Item'' with the same identification already exists in the Akonadi database. If so, it overwrites it with the newly received ''Item''. Otherwise a new ''Item'' entry is created in the database. The merging happens using RID (RemoteID) or optionally GID in cases where RIDs are unstable. It's also possible to combine both.
 +
 
 +
== Scheduler ==
 +
Each ''Resource'' has a ''Scheduler'', which holds a queue of tasks that the ''Resource'' should perform. In fact, the ''Scheduler'' has several queues with different priorities. For example, a queue for tasks that write changes from Akonadi to the backend has the highest priority, while the queue to download new changes from the backend has a lower priority. Whenever the ''Resource'' receives a new task (more on tasks below) it puts it into the ''Scheduler'', which then dispatches it based on the priority by calling the respective method in the Resource implementation. Once the Resource is done with handling the task, it tells the ''Scheduler'' that it's done and that it can schedule the next task.
 +
 
 +
== Online/Offline ==
 +
Resources can be in an online or offline state. This is not related to network connectivity status; even a local-only resource like the Maildir resource can be in offline state. The state indicates whether the ''Resource'' is able to store changes in its backend. For remote ''Resources'' (like IMAP) the online/offline status often matches the online/offline status of the network connectivity, but it's also possible to manually switch a ''Resource'' to offline.
 +
 
 +
Resources which are in the offline state reject all Sync requests and store ''Change Notifications'' in their ''Change Recorder's'' journal. Once the ''Resource'' switches back to online, it will first replay all pending changes from its ''Change Recorder''; only then it will start processing Sync requests again.
 +
 
 +
== Configuration ==
 +
Most ''Resources'' need a configuration. The configuration dialog can be invoked through a DBus call to the ''Resource'' and runs within the ''Resource's'' process.
 +
 
 +
'''Example:''' Configuration options can be the server to which to connect to, credentials, sync frequency, etc.
 +
 
 +
 
 +
 
 +
= DB Tables =
 +
 
 +
This is a brief description of tables in the database that the ''Server'' stores all the data in and how they relate to the ''Entities'' and components described above.
 +
 
 +
== SchemaVersion ==
 +
A standalone table that holds information about the current version of the schema.
 +
 
 +
== ResourceTable ==
 +
Holds a list of active ''Agent'' and ''Resources'' ''Instances''.
 +
 
 +
'''TODO:''' What is an ''active Agent''? Is it an existing one, or one that is online? Can ''Agents'' be inactive?
 +
 
 +
== PimItemTable ==
 +
Holds metadata about ''Items'' - ID, parent ''Collection'', size etc. This is a very big table - one row per every email, contact, event etc.
 +
 
 +
== PartTable ==
 +
''PartTable'' holds the actual ''Payload'' parts and ''Attributes'' for ''Items''. This is the largest table in Akonadi as it contains on average 3 rows per each row in ''PimItemTable''.
 +
 
 +
== PartTypeTable ==
 +
Contains names of parts and attributes from ''PartTable'' (like PLD:ENVELOPE, PLD:HEAD, ATR:noselect, etc.) - this is a very small table (around 10 rows normally) and its purpose is purely to de-duplicate the often-repeated strings from the already-big ''PartTable''.
 +
 
 +
== MimeTypeTable ==
 +
''MimeTypeTable'' holds a list of MIME types. This is a very small table and like ''PartTypeTable'' is used simply to de-duplicate repeated strings from the ''PimItemTable'' and to allow a many-to-many relation between ''Collections'' and MIME types.
 +
 
 +
== FlagTable ==
 +
''FlagTable'' holds ''Item'' flags, like "seen", "spam", "hasattachment" etc. The table only holds simple strings and is fairly small (we have around 20 flags).
 +
 
 +
'''TODO:''' Flags are not described in the ''Basic Entities'' section above.
 +
 
 +
== PimItemFlagRelation ==
 +
A single ''Item'' can have 0-N flags and this table describes the relation. This is a fairly big table as it usually has more than one flag per each ''PimItem'' row.
 +
 
 +
== CollectionTable ==
 +
The ''CollectionTable'' holds metadata about ''Collections'' - their ID, parent ''Collection'', cache policy etc. This normally a small-ish table - one row per email folder, calendar, addressbook etc.
 +
 
 +
== CollectionMimeTypeRelation ==
 +
This many-to-many table describes the relation between ''Collections'' in the ''CollectionTable'' and MIME types in the ''MimeTypeTable'', i.e., which ''Collection'' can hold ''Items'' of which MIME type.
 +
 
 +
== CollectionAttributeTable ==
 +
This table holds additional ''Attributes'' for ''Collections''. One ''Collection'' can have multiple ''Attributes'', but an attribute belongs to exactly one ''Collection''.
 +
 
 +
== CollectionPimItemRelation ==
 +
This table describes the relation between ''Items'' and ''Virtual Collections''. This does not describe parent-child relationship, that's in ''PimItemTable.collectionId''. The size of this table varies depending on how much you use the "Search" feature in KMail.
 +
 
 +
== TagTable ==
 +
''TagTable'' holds ''Tags''. Usually a small table, one row per ''Tag'' and people generally don't have more then a few dozen ''Tags'' (most people don't use this feature at all).
 +
 
 +
== TagTypeTable ==
 +
This table holds ''Tag'' types - this is purely to de-duplicate common strings from ''TabTable''.
  
==== Online/Offline ====
+
== TagAttributeTable ==
Resources can be in an online or offline state. This is not related to network connectivity status, even a local-only resource like the Maildir resource can be in an offline state. The state indicates whether the Resource is able to store changes in its backend. For remote resources (like IMAP) the online/offline status often matches the online/offline status of the network connectivity, but it's also possible to manually switch a Resource to offline.
+
A table equivalent to ''CollectionAttributeTable'', but for ''Tags'' instead of ''Collections''.
  
Resources which are in the Offline state store reject all Sync requests and store change notifications in the ChangeRecorder journal. Once the Resource switches back to online, it will first replay all pending changes from ChangeRecorder and then it will start processing Sync requests again.
+
== TagRemoteIdResourceRelation ==
 +
For each ''Tag'', this table holds the remote ID for each ''Resource'' that this ''Tag'' is used in. This allows for different representation of the same ''Tag'' in different ''Resources''.
  
==== Configuration ====
+
'''Example:''' The user wants to tag both emails and events with a "KDE" ''Tag'', but tags in IMAP are "$"-prefixed ("$KDE"), and CalDAV associates a tag's UUID (e.g. "{abcde-ef012-3456}") with events. In this case, the ''TagRemoteIdResourceRelation'' table has a ("KDE", "someIMAPResource", "$KDE") triple and a ("KDE", "someCalDAVResource", "{abcde-ef012-3456}") triple.
Most Resources need a configuration - server to which to connect, credentials, sync frequency etc. The configuration dialog can be invoked through a DBus call to the Resource and runs within the Resource process.
 
  
=== Search ===
+
'''TODO:''' This description has a strong specification taste to it. Most of it should probably be moved to the ''Basic Entities'' or ''Basic components'' section.
There is a special agent called Akonadi Indexing Agent which listens to changes in Items and indexes the Items into a Xapian database. There's a separate database for emails, contacts, event, notes, and contacts parsed from emails (like senders, etc.).
 
  
When a client wants to perform a search, they can either query the Xapian database directly in-process through a search query, which will return a list of Item IDs matching the query. The client can then retrieve the respective Items from Akonadi.
+
== PimItemTagRelation ==
 +
A single ''Item'' can have multiple ''Tags'' and this table describes the relation.
  
A second option is a so-called persistent search. Persistent search is represented as a Virtual Collection belonging to the virtual Search Resource. The virtual Collection holds a search query, which is re-executed whenever an Item changes and all Items matching the query are linked to the virtual Collection. This allows keeping a persistent filter for Items even across multiple different Collections.
+
== RelationTable ==
 +
Holds ''Relations'' between two ''Items''.
  
Search infrastructure is currently undergoing major overhaul codenamed "Make Indexing Great Again", see
+
== RelationTypeTable ==
https://phabricator.kde.org/T7014
+
An equivalent to ''TagTypeTable'', but for ''Relations''.

Latest revision as of 09:51, 18 June 2018

Akonadi Concepts and Architecture

This document describes and explains the core elements within Akonadi (like Items, Collections, etc.) as well as the architecture of the entire solution (clients, agents, server, etc.) and how they interact with each other. The reason this is all explained in a single document is so that it's easier to see how all the dots connect.

Eventually, this should be moved or copied into Akonadi docs.

Most of the following is meant to be a rough specification. Implementation details of a concept that their users should not rely upon are clearly marked.


Entities

The term Entity is often used as a common term for all the elements described below.

Attributes

Attributes are additional metadata that can be attached to other Entities (except for other Attributes). An Attribute has a type and a value. Client applications and Agents can define their own Attributes, but there are also some pre-defined Attributes.

Example: The pre-defined "EntityDisplay" Attribute allows customizing how an Entity is presented to the user in a client (by setting custom display name, icon, background color etc.).

Items

An Item is an abstract representation of data. Items have metadata (ID, size, MIME type, etc.), Payload parts (the actual data) and attributes. Each Item has exactly one parent Collection.

Example: An Item can represent an email. Such an email Item may have envelope, head and one or more body Payload parts. An Item can also represent a contact, a calendar event etc.

Collections

A Collection, as the name suggests, is a collection of Items. A Collection can also have child Collections, thus creating a Collection tree. Collections can also have attributes. Each Collection is owned by a Resource (see below). Finally, a list of MIME types is associated with each Collection; every Item in a collection must be of one of the associated MIME types.

Example: Email folders are collections of Items of "email" MIME type. Calendars are collections of Items of "Todo" or "Event" MIME type.

Virtual Collections

A Virtual Collection is a Collection that cannot own Items or have non-virtual subcollections. Instead of being a parent of Items, Items are linked into Virtual Collections. One Item can be linked into multiple Virtual Collections.

Example: Virtual Collections are typically used to hold search results, i.e., a Virtual Collection represents a search query and all Items linked to it are those that match the query.

Implementation detail: Virtual Collections are represented as regular Collections.

Tags

A "Tag" is a unary relation on "Items", and can thus be seen as an "Item" property. A single Item can have multiple Tags and a single Tag can be assigned to multiple Items.

Example: a "Work" tag can be assigned to many emails, tasks and events (or rather Items representing those) that are somehow related to the user's work.

Relation

A Relation is a binary relation on "Items", i.e., it describes a specific relation between exactly two Items. A single Item can be in multiple Relations, even in multiple Relations of the same type.

Example: We can have an "INVITATION" Relation between an Item that represents an email with meeting invitation and an Item that represents a calendar event that was created from this invitation email. If the event has multiple participants and an invitation email was generated for each participant, those emails would all be in "INVITATION" Relation to the event.


Basic components

Server

Server refers to the server process that other components talk to via the Akonadi Protocol. It manages the cached Entities and persists them.

Implementation detail: The Server uses a SQL database to persist the cached Entity.

Agents

Agents are single-purpose processes that get notified when an Entity is created, modified or removed from the Server.

Example: A MailFilterAgent which is notified whenever a new Item is created; if the Item holds an email, it will apply local mail filters to it and store the change back in Akonadi.

Resources

Resources are special cases of Agents that synchronize data between the Server and a remote server.

Example: An "IMAP resource" synchronizes data between the Server and some IMAP server.

Agent Types and Agent Instances

An Agent Type is a named Agent implementation. We cannot have two Agent Types with the same name. An Agent Instance is a running instance of some Agent Type. Each Agent Type can have multiple Agent Instances.

The same terminology applies to Resources: There are Resource Types with a unique name, and each can have multiple Resource Instances.

Example: To manage emails on multiple IMAP servers, we can create multiple Resource Instances of type IMAP resource. The name of that type might be "IMAPResource", and there cannot be another Resource Type with the same name.

Clients

Clients are user-facing applications that presents data from Akonadi to users and allows them to interact with the data.

Example: KMail is a Client that presents email Items to the user, lets them create new email (as Items) or folders (as Collections), etc. KOrganizer works with calendar data instead.


Some more concepts

ID

Item ID, Collection ID, and Tag ID are database primary keys, but are exposed to clients to uniquely identify each Entity.

Remote ID

RemoteID is a string-based identifier that is used by the backend (IMAP server, CalDAV server etc.) to identify the Entity. This is only exposed to Resources, since those are the only ones to actually understand what the Remote ID means.

Example: The Remote ID can be the UID of an email Item or the mailbox name for a Collection on an IMAP Resource, or the name of an email file on a Maildir Resource.

GID

GID is a string-based identifier extracted from the payload and is exposed to clients.

Example: The Message-ID header of an email or the UID of an iCal event are typical GIDs of their corresponding Items.

Payload Type

Each Payload part of an Item has one of three Payload types:

Internal Payloads are stored directly inside Akonadi's database. These are used if the Payload is sufficiently small (4kB by default).

External Payloads are used for larger payloads. The actual payload is stored in a separate cache file (inside file_db_data), and then only name of that file is stored in Akonadi's database.

Finally, Foreign Payloads can be used for Resources where the backend is not remote, but presents local files to Akonadi instead. The database holds the absolute filepath of the local file that holds an Item's part. Note, however, that Foreign Payloads are not used by anyone as of now (2018-06).

Cache

Akonadi is a cache, not a storage. New Items are downloaded from the backend services (IMAP server, CalDAV server, maildir, ...) by Resources and uploaded to the Server reguarily. Any changes done to Entities by Clients (marking an email as read, creating a new event, deleting a contact etc.) are send to the respective Resource that owns the Item in question, and the Resource replays the change to the remote service. If the remote service is not available (let's say user is offline, but they mark a bunch of email as read or move them to some other folder) the changes are recorded by the Resource and are replayed once network is available.


Architecture Overview

TODO: This mix of architecture specification and implementation details might not be ideal.

Protocol

All components communicate with each other via the Protocol. The Protocol is a custom binary protocol with commands and responses. Each Client opens one or more connections (called Session, Command Session or Command Bus) to the Server and can send commands to the server requesting or modifying data.

Each Client can create multiple Sessions. This is useful because Sessions don't support command pipelining, meaning that the next command in the queue is not sent to the server until a response to the previous command has arrived, which can cause undesirable waits for the user.

Note that the Protocol is an implementation detail. It is not exposed to the Clients, who only interact with the Server via the Client API, which internally issues and handles the communication via the Protocol.

Example: In KMail, the message list and the message viewer each have their own Session. This way, when the user opens a huge folder, they can click on the first email immediately and the message viewer can retrieve it through its own Session without having to wait for the message list to receive all emails from Akonadi first.


Akonadi Control

akonadi_control is a small, but very important part of Akonadi. When you type "akonadictl start", or when you start an Akonadi-enabled application like Kontact, they will start the akonadi_control process. Akonadi Control is responsible for starting the Akonadi server and all configured Resources. It also automatically restarts them when any of them crashes. The second important role of Akonadi Control is that it provides a DBus interface to communicate with the Akonadi Resources.

Akonadi Server

Akonadi Server is the implementation of the Server concept described above. In principle, the Akonadi Server is very simple: it receives commands from clients, handles them by reading or writing to the database, sends back a response and generates a Change Notification if needed.

Each connection is handled in a separate thread on the server in a Connection object. This allows the implementation of the command handlers to be blocking and also allows to keep some context for each connection. Whenever a new command is received on a connection, it inspects which type of command it is and creates a respective Handler (e.g. StoreHandler, AppendHandler, MoveHandler, etc.)

For read ("Fetch") commands, the respective Handler will construct an SQL query to retrieve the requested Entities from the database. It will them serialize them into the Protocol and send them back to the client. In some cases, the Akonadi Server can first request that the Item payloads are first retrieved from the owning Resource - this is because Collections can have an expiration policy, meaning that after some timeout, Akonadi Server will delete the payload of the Items in that Collection from the database. When a client requests a payload of an Item that is missing the payload, the Akonadi Server will request the Resource that owns the Item (via a DBus call) to retrieve the payload and upload it to Akonadi using the standard Job mechanism. Once done, the Resource notifies the Server via DBus call again that it has finished and the Akonadi server continues with retrieval as usual.

For each write command, once the data are written to the database, the Server will generate a change notification describing what and how was changed.

The Server also provides search functionality in part, which is described in detail below.


Search

There is a special agent called Akonadi Indexing Agent which listens to changes in Items and indexes the Items into a Xapian database. There is a separate database for emails, contacts, event, notes, and contacts parsed from emails (like senders, etc.).

When a client wants to perform a search, they can either query the Xapian database directly in-process through a search query, which will return a list of Item IDs matching the query. The client can then retrieve the respective Items from Akonadi.

A second option is a so-called persistent search. Persistent search is represented as a Virtual Collection belonging to the virtual Search Resource. The Virtual Collection holds a search query, which is re-executed whenever an Item changes and all Items matching the query are linked to the Virtual Collection. This allows keeping a persistent filter for Items even across multiple different Collections.

Search infrastructure is currently undergoing major overhaul codenamed "Make Indexing Great Again". See https://phabricator.kde.org/T7014 for details.


Client API

From an application point of view, the Server and the Protocol are just implementation details that they are not aware of and don't interact with directly in any way. The only means for Clients to interact with Akonadi is through the Client API.

Clients communicate with the server either using the Jobs API, the Notification API, or a full-fledged Entity Tree Model.

Jobs API

Jobs are the core elements of the Client API. A Job is an asynchronous task that can retrieve data from Akonadi or modify them. Once finished, the job emits the result() signal after which the Client can handle the result of the Job.

A Job is associated to a Session on creation. If the caller does not specify a Session, the default Session is used. Each Session has a Job queue, where new Jobs are automatically enqueued. The Jobs of each Session are never processed in parallel, but sequentially: A Job is only processed when every earlier Job is finished.

Example: An ItemFetchJob retrieves Items from Akonadi; an CollectionModifyJob modifies Collections.

Notification API

If a Client is interested in changes to Entities (create/modify/move/remove/link/...), it may subscribe to Change Notifications by creating a Monitor. A Monitor has signals for each type of change that can occur, so Clients can connect only to those they are interested in. At any time, the Client may set the Monitor's scope, which specifies the kind of notification the Client is interested in, like the kind of affected Entities, only a specific Entity, the type of change, etc. For each Entity change that matches the scope, the Monitor issues a Change Notification that describes the change. The Change Notification contains a description of the change as well as the changed Item, the latter being called Notification Payload.

A Change Recorder is a special Monitor that writes each reported change to a journal file. The user is responsible to call changeProcessed() whenever it handles a change upon which the Change Recorder will remove the notification from the journal and will dispatch the next notification in the queue. A Change Recorder is only very rarely needed by Clients. Instead, they are used by some Resources to record local changes that have not yet been propagated to a remote server, e.g., if an Item of a Resource is modified while that Resource is offline. In this case, all notifications will be stored in the Change Recorder's journal and once the internet connection is available again, the Resource will request the Change Recorder to replay all notifications from the journal so that it can upload the changes to the backend.

Implementation detail: Monitors establish a connection, called subscription, to the Server upon creation, and the Server keeps track of the currently existing subscriptions and the scopes of the corresponding Monitors. Whenever an Entity is modified on the Server by a Client or Agent, the Server generates a notification message that describes which Entities have changed, and how. It will then compare the notification message to the scopes of the subscribers to see if the subscriber is interested in this particular notification. If the notification message matches a subscriber's scope, the notification message as well as the changed Entity (Notification Payload) is sent over to the subscriber.

On the client side, the notification message is received by the Monitor and put into a pipeline. The Monitor then issues a Change Notification containing a description of the change as well as the changed Entities themselves. This is done by emitting an appropriate signal that depends on the type of the notification.

TODO: Does a Monitor establish two distinct connections, one Notification Bus and a Command Bus? If so, which is used for what? I guess that the Command Bus is used to subscribe/unsubscribe and to set the scope (communication from Monitor to Server), and the Notification Bus is used for the notification messages (communication from Server to Monitor)?

Entity Tree Model (ETM)

An Entity Tree Model is a QAbstractItemModel which holds the entire tree of Collections and Items and keeps it up to date. It is possible to filter the content of an ETM in many ways to only include Entities of a certain type or MIME type. An ETM is usually used in applications in combination with various proxy models.

It is not possible to modify Entities by changing them in the ETM. Instead, the Client has to use a Job to modify an Entity.

Example: In KMail, the folder list and message list are both sharing the same ETM under the hood, but use different proxy models to display only a specific part of the tree in each view.

Implementation detail: An ETM automatically keeps itself up-to-date by using a Monitor. Therefore, it will automatically (and asynchronously) reflect all changes made via the Jobs API by appropiately processing Change Notifications.


Resources

As explained above, Resources take care of synchronizing changes between Akonadi and the Resource's backend (IMAP server, CalDAV server, local iCal file, etc.).

TODO: This section is interesting for Resource authors, but then it lacks a Resource API section. Part of that is currently described in Change Replay (the fact that all Resources implement AgentBase::Observer).

Tasks

In general, there are two types of tasks: ChangeReplay and Sync. ChangeReplay tasks have the highest priority and they represent a change in the Akonadi data that needs to be written to the backend. This can be a new flag being added to an email, or a new calendar event being created or a contact being removed. Sync tasks (FetchItems, SyncCollectionTree, SyncCollection etc.) are tasks that are asking the Resource to download any new changes from the backend and put them into Akonadi. Sync tasks do not write anything to the backend.

TODO: It should be possible to describe both Tasks without mentioning the Scheduler. Also, make clear that Replay is Akonadi-to-backend propagation, and Sync is backend-to-Akonadi propagation.

Change Replay

Change Replay tasks are created whenever the Change Recorder of a Resource is notified about a change. The Change Recorder will store the Change Notification in a journal and will pass it to the Scheduler. In case the Resource cannot handle the change immediately, maybe because the Resource is offline and it needs network access to connect to the backend, the change remains stored in the Change Recorder's journal until the Resource goes online again. When that happens, the Change Recorder is asked to replay all the changes from the journal by passing them to the Scheduler.

Each Resource implements the AgentBase::Observer interface. This interface has methods like itemAdded(item, parentCol), itemsRemoved(items), itemsFlagsChanged(items, addedFlags, removedFlags) etc. that must be implemented by the Resource. In those methods, the Resource implementation takes the changed data and writes it to the backend using the backend protocol/format (IMAP, CalDAV, ICal etc.). As the Scheduler is replaying the tasks, depending on the type of the change it calls the respective method from the Observer interface, waits for the Resource to confirm that the change has been succesfully written to the backend, and then schedules the next task and so on until it runs out of tasks in the queue or until another task arrives.

Synchronization

Synchronization means retrieving data from the backend and storing them in Akonadi, so it's a one-way synchronization. Synchronization can be requested via a Resource's DBus interface - either via SynchronizeCollectionTree task, SynchronizeCollection task (which synchronizes all Items within a specified Collection) or SynchronizeItem, which retrieves a specified Item. There's also SyncAll tasks, which schedules SynchronizeCollectionTree task followed by SynchronizeCollection task for each Collection. There are more tasks of course to sync attributes, tags etc. but they all work on the same principle.

The Resource does not have to synchronize the entire Item, for example for emails we often only synchronize the envelope, which is enough to display the email in the message list in KMail, and the actual body is retrieved on demand once the user opens the email in KMail and KMail requests the payload body from Akonadi.

Item Synchronization happens by the so-called merging process, when the Server tries to see if an Item with the same identification already exists in the Akonadi database. If so, it overwrites it with the newly received Item. Otherwise a new Item entry is created in the database. The merging happens using RID (RemoteID) or optionally GID in cases where RIDs are unstable. It's also possible to combine both.

Scheduler

Each Resource has a Scheduler, which holds a queue of tasks that the Resource should perform. In fact, the Scheduler has several queues with different priorities. For example, a queue for tasks that write changes from Akonadi to the backend has the highest priority, while the queue to download new changes from the backend has a lower priority. Whenever the Resource receives a new task (more on tasks below) it puts it into the Scheduler, which then dispatches it based on the priority by calling the respective method in the Resource implementation. Once the Resource is done with handling the task, it tells the Scheduler that it's done and that it can schedule the next task.

Online/Offline

Resources can be in an online or offline state. This is not related to network connectivity status; even a local-only resource like the Maildir resource can be in offline state. The state indicates whether the Resource is able to store changes in its backend. For remote Resources (like IMAP) the online/offline status often matches the online/offline status of the network connectivity, but it's also possible to manually switch a Resource to offline.

Resources which are in the offline state reject all Sync requests and store Change Notifications in their Change Recorder's journal. Once the Resource switches back to online, it will first replay all pending changes from its Change Recorder; only then it will start processing Sync requests again.

Configuration

Most Resources need a configuration. The configuration dialog can be invoked through a DBus call to the Resource and runs within the Resource's process.

Example: Configuration options can be the server to which to connect to, credentials, sync frequency, etc.


DB Tables

This is a brief description of tables in the database that the Server stores all the data in and how they relate to the Entities and components described above.

SchemaVersion

A standalone table that holds information about the current version of the schema.

ResourceTable

Holds a list of active Agent and Resources Instances.

TODO: What is an active Agent? Is it an existing one, or one that is online? Can Agents be inactive?

PimItemTable

Holds metadata about Items - ID, parent Collection, size etc. This is a very big table - one row per every email, contact, event etc.

PartTable

PartTable holds the actual Payload parts and Attributes for Items. This is the largest table in Akonadi as it contains on average 3 rows per each row in PimItemTable.

PartTypeTable

Contains names of parts and attributes from PartTable (like PLD:ENVELOPE, PLD:HEAD, ATR:noselect, etc.) - this is a very small table (around 10 rows normally) and its purpose is purely to de-duplicate the often-repeated strings from the already-big PartTable.

MimeTypeTable

MimeTypeTable holds a list of MIME types. This is a very small table and like PartTypeTable is used simply to de-duplicate repeated strings from the PimItemTable and to allow a many-to-many relation between Collections and MIME types.

FlagTable

FlagTable holds Item flags, like "seen", "spam", "hasattachment" etc. The table only holds simple strings and is fairly small (we have around 20 flags).

TODO: Flags are not described in the Basic Entities section above.

PimItemFlagRelation

A single Item can have 0-N flags and this table describes the relation. This is a fairly big table as it usually has more than one flag per each PimItem row.

CollectionTable

The CollectionTable holds metadata about Collections - their ID, parent Collection, cache policy etc. This normally a small-ish table - one row per email folder, calendar, addressbook etc.

CollectionMimeTypeRelation

This many-to-many table describes the relation between Collections in the CollectionTable and MIME types in the MimeTypeTable, i.e., which Collection can hold Items of which MIME type.

CollectionAttributeTable

This table holds additional Attributes for Collections. One Collection can have multiple Attributes, but an attribute belongs to exactly one Collection.

CollectionPimItemRelation

This table describes the relation between Items and Virtual Collections. This does not describe parent-child relationship, that's in PimItemTable.collectionId. The size of this table varies depending on how much you use the "Search" feature in KMail.

TagTable

TagTable holds Tags. Usually a small table, one row per Tag and people generally don't have more then a few dozen Tags (most people don't use this feature at all).

TagTypeTable

This table holds Tag types - this is purely to de-duplicate common strings from TabTable.

TagAttributeTable

A table equivalent to CollectionAttributeTable, but for Tags instead of Collections.

TagRemoteIdResourceRelation

For each Tag, this table holds the remote ID for each Resource that this Tag is used in. This allows for different representation of the same Tag in different Resources.

Example: The user wants to tag both emails and events with a "KDE" Tag, but tags in IMAP are "$"-prefixed ("$KDE"), and CalDAV associates a tag's UUID (e.g. "{abcde-ef012-3456}") with events. In this case, the TagRemoteIdResourceRelation table has a ("KDE", "someIMAPResource", "$KDE") triple and a ("KDE", "someCalDAVResource", "{abcde-ef012-3456}") triple.

TODO: This description has a strong specification taste to it. Most of it should probably be moved to the Basic Entities or Basic components section.

PimItemTagRelation

A single Item can have multiple Tags and this table describes the relation.

RelationTable

Holds Relations between two Items.

RelationTypeTable

An equivalent to TagTypeTable, but for Relations.


This page was last edited on 18 June 2018, at 09:51. Content is available under Creative Commons License SA 4.0 unless otherwise noted.