https://community.kde.org/index.php?title=KDE_PIM/Meetings/Osnabrueck_4/PIM_Storage_Service_Requirements&feed=atom&action=historyKDE PIM/Meetings/Osnabrueck 4/PIM Storage Service Requirements - Revision history2024-03-28T14:58:54ZRevision history for this page on the wikiMediaWiki 1.40.2https://community.kde.org/index.php?title=KDE_PIM/Meetings/Osnabrueck_4/PIM_Storage_Service_Requirements&diff=10360&oldid=prevMahoutsukai: add information from Osnabrueck 42011-02-26T15:32:08Z<p>add information from Osnabrueck 4</p>
<p><b>New page</b></p><div>= Result of our requirements brainstorming session on Friday =<br />
<br />
* Flat Access<br />
* Categorization by Attributes<br />
* Job Priorities<br />
* Virtual Folders<br />
* Filtering<br />
* Access to parts of objects (mimetypes)<br />
* Change Notification<br />
* Shared Cache<br />
* Asynchronous Access<br />
* Out-of-process service plugins<br />
* Online/Offline state management<br />
* PIM Object Handle<br />
* No hard locks<br />
* Conflict Handling<br />
* Referencing objects (local or on server)<br />
* Capacities and Capabilities of Storage Backends<br />
* URI scheme to identify resources<br />
* Lazy Loading<br />
* Copy-on-write implementation for PIM objects (using snapshots)<br />
* Using Changesets<br />
* Syncing with groupware servers<br />
* Undo<br />
* Resources (storage units)<br />
* Non-global resource activation profiles<br />
<br />
<br />
= Till's mail about the requirements from the mail side =<br />
<br />
<pre><br />
From: Till Adam <adam@kde.org><br />
To: KDE PIM <kde-pim@kde.org>, kmail-devel@kde.org<br />
Date: Thu, 6 Oct 2005 09:18:58 +0200<br />
<br />
On Thursday 06 October 2005 08:15, Cornelius Schumacher wrote:<br />
> On Thursday 06 October 2005 05:06, Mark Bucciarelli wrote:<br />
> > With this approach, I imagine we would see gains of two orders of<br />
> > magnitude in memory usage for large (year-long) files. If korg only<br />
> > loads event headers for the current month, then startup would be a<br />
> > constant speed no matter how large the data set.<br />
><br />
> That's what I called "proxy objects" in my reply to the PIM daemon<br />
> proposal. The drawback of this would be that you have a delay when loading<br />
> the missing data. When you for example navigate through several months in<br />
> KOrganizer then you would see an empty month at first and the events would<br />
> pop up later. Not a very user-friendly solution. It would also mean that if<br />
> you open an editor there would be a delay until all the data is loaded, so<br />
> that you would start with an empty, disabled editor and the content of the<br />
> fields would be filled in later until you are finally able to use the<br />
> editor. Not pretty.<br />
<br />
I've been thinking about the mail side of things a bit and come to the <br />
conclusion that something like a proxy of facade object is definitely needed <br />
for mail. We currently have three sorts of pointers to messages, and then two <br />
flags per message that signal the state of their "completeness". This is not <br />
sufficient and the fact that pointers to messages go away and are replaced by <br />
something else, is a major problem and our number one source of crashes. Yet <br />
the reason for this design was the need to have something extremely light <br />
weight to represent a message until more information is needed, because <br />
otherwise a folder with 10000 or 100000 mails would become completely <br />
unusable. Having the to-be-lazy-loaded information readily enough available <br />
that the user perceives no or only little delay when requesting it, is of <br />
course a challenge, and in the presence of across the network retrieval also <br />
has physical limitations, but online IMAP in KMail, which already works like <br />
that, to an extent, proves that it can be done. Caching could be a lot <br />
better, but more on that later.<br />
<br />
Before I go into details a couple of general comments:<br />
<br />
I think a design meeting is a great idea, I would welcome it.<br />
<br />
I agree that this design is crucial, should be very well thought out, and not <br />
rushed in any way. We need to get this right.<br />
<br />
I agree that we should look at EDS and also other solutions, they must have <br />
solved many of the same problems. If compatability with EDS seems achievable, <br />
I would consider that a worthy goal, but not if it hurts our power or <br />
flexibility.<br />
<br />
Braindump of my musings on mail storage thus far, in no particular order:<br />
<br />
- mails are identified by a globally unique serial number (whether to expose <br />
that outside of KMail or use an URI scheme for that - possibly including the <br />
sernum - is a separate discussion)<br />
<br />
- there is a one to one mapping between the serial number and a ref-counted <br />
pointer to a Message object, which is initially an empty skeleton, containing <br />
no information beyond the serial number<br />
<br />
- internally, the mail store holds mappings of serialnumber, storage URL and <br />
cache URL <br />
<br />
- the Message API allows retrieval of those parts of the mail that are needed, <br />
such as Envelope (what is needed for display in the headers list), Headers, <br />
body parts, etc. If they are in the cache, they come from there, otherwise <br />
from the storage location (server)<br />
<br />
- access to all of these parts is asynchronous, with possibly synchronous <br />
convenience wrappers where access needs to be immediate for preformance <br />
reasons and can reasonably be expected to be immediate, such as envelope <br />
reqeuests<br />
<br />
- caching policies, which can apply to accounts, folders, even messages, <br />
govern how much information of a mail is locally present, and how much of the <br />
lazy loaded information that isn't, initially, is kept around. This allows <br />
scenarios such as "in this folder, don't download anything from the imap <br />
server beyond the envelope, but if I look at the mail, keep the bodies around <br />
in the cache", or "sync everything for this account, but not attachments, and <br />
not mails over 5 MB on mailcheck or mails in my SPAM folder"<br />
<br />
- messages (sernums) can have an arbitrary set of category flags associated <br />
with them, a la GMail labels, references to other PIM data, via URIs maybe<br />
<br />
- storage folder location can be used as one (but not the only) grouping <br />
criterion, possibly modelled as a category flag, internally<br />
<br />
- local mail (cache) storage is in maildir format, a local maildir account is <br />
simply one with cache URL == storage URL (implementation detail)<br />
<br />
- the current folderstorage subclasses become machines for mapping storage URL <br />
to cache URL and shifting data from one to the other on request<br />
<br />
- the internal mapping of sernum, storage URL, cache URL, category flags and <br />
performance critical envelope data (what used to be the index) is stored in a <br />
relational database, such as SQLite, which provides central, transactional, <br />
integrity guaranteed access to that information through the API <br />
(implementation detail)<br />
<br />
- I imagine access to all of this via a libemailstorage (or even libpimdata, <br />
or something) which dishes out handles to read-only (vast majority, for mail) <br />
and read-write instances of mails, handles locking, copy-on-write, etc. <br />
Whether that is implemented via a server process, which the lib talks to, or <br />
by concurrent access to the above mentioned database is a yet to be resolved <br />
implementation detail, and mostly orthogonal to the storage layer API, I <br />
believe<br />
<br />
Open questions:<br />
<br />
- how do accounts fit in? Should an account be a set of credentials for access <br />
to a set of storage URLs plus a set of attributes, such as cache policies, <br />
and managed by a pim-wide entity? How about connection tracking, is that <br />
orthogonal?<br />
<br />
- are all of the special features of certain server types (IMAP, Groupwise, <br />
HTTPMail, etc) integrateable into such a scheme? Things like quota, ACLs, etc<br />
<br />
- what should the query language look like? A special API, aware of mail <br />
semantics? should URI schemes (mail:/#12345/headers/from, <br />
mail:/#12345/body/attachment) be used, SQL, IMAP?<br />
<br />
- how to integrate this with Interview? Should folders be filtered (proxy) <br />
models on a global mailstore model? Sorting and threading as sorted (proxy) <br />
models on top of that? How much of that should be in the library, and how <br />
much in KMail? Does it make sense to be able to display a folder in any <br />
QAbstractItemView? <br />
<br />
- probably many more ...<br />
</pre></div>Mahoutsukai