PIM/MS Windows/SQLite Folder Indices

From KDE Community Wiki

There are issues with locking index files for KMail folders and mmap()/munmap() operations on Windows. Therefore, SQLite-based indices are in development. This page presents detailed development notes for this task.

Design and implementation started by jstaniek 11:35, 23 April 2008 (CEST)

Installation

Update (7 May 2008): commit 805075 merged changes related to the SQLite mode for KMail indices from /branches/work/kmail-nommap (r799390..804487) and /branches/work/kdepim-nommap/kmail (r804484..804960) back to trunk.

Thus, it is enough to use kdepim trunk now. Users of kdepim-nommap should execute emerge --unmerge kdepim-nommap and then emerge kdepim.

Old instructions follow: branches/work/kdepim-nommap branch has been created for SQLite mode, being synced with kdepim trunk. The only difference is kmail/ subdirectory. To build it on Windows:

  1. if you have already installed kdepim trunk, type 'emerge --unmerge kdepim'
  2. update your emerge directory: 'cd {KDEROOT}', 'svn up emerge'
  3. type 'emerge --update kdepim-nommap'

The last command will download the source code from the branch, with altered kmail/ source code, compile it and install. All the resulting files have the same names as in regular kdepim trunk. sqlite package will be also installed as a hard dependency of the kdepim-nommap package.

Introduction

  • we call the new implementation SQLite mode for short and the old implementation mmap mode
  • SQLite 3.5.4 is used, as provided by emerge sqlite module; we should not allow using much older versions of sqlite, e.g. 3.1 because of file format differences
  • we are using one .index.db "index file" per folder (on request, could be possible implement it per-account)
  • kmailprivate links to sqlite library for SQLite mode, and KMAIL_SQLITE_INDEX is defined to enable #ifdef'd code
  • kmfolderindex_sqlite.cpp is created and edited as a copy of kmfolderindex.cpp; kmfolderindex.cpp #includes kmfolderindex_sqlite.cpp for SQLite mode and then skips its own code completely
  • kmfolderindex_common.cpp is always included by kmfolderindex.cpp; implements KMFolderIndex::openInternal() and KMFolderIndex::createInternal()
  • kmfolderindex.h is a common header for both kmfolderindex*.cpp implementations

KMFolderIndex

api docs

  • 2008-04-23..25
    • mIndexId unused - removed as well as serialIndexId()
    • indexLocation(): added .db suffix to indicate the index is sqlite-based, implementation moved to FolderStorage (before FolderStorage only had it as abstract method)
    • INDEX_VERSION is written and checked using 'PRAGMA user_version = <integer>' command [1]
    • we do not use temporary filenames, e.g. in writeIndex(): SQLite takes care about safe storage
    • updateIndex(): no changes, we're changing implementation of KMMsgBase::syncIndexString() and writeIndex() instead
    • added openDatabase( int mode ) for SQLite mode
    • readIndex() implemented for SQLite mode - uses SELECT command on messages table
    • readIndexHeader() uses "PRAGMA user_version" sqlite command
    • writeIndex() implemented for SQLite mode - uses INSERT command on messages table, within transaction
    • common code from {KMFolderMbox|KMFolderMaildir}::open( const char * ) moved to KMFolderIndex::openInternal()
    • common code from {KMFolderMbox|KMFolderMaildir}::create() moved to KMFolderIndex::createInternal()
  • 2008-05-01..02
    • implemented updateIndex() (see the table) and added extended version of writeMessages() with WriteMessagesMode modes
    • writeMessages(): depending on the mode (WriteMessagesMode) the sql statement "INSERT INTO messages(data) VALUES(?)" or "INSERT OR REPLACE INTO messages(id, data) VALUES (?, ?)", thus unique IDs are created when new yet unsored messages are saved, for later use
    • readIndex() now calls "SELECT id, data FROM messages"
    • 'messages' table creation is now: "CREATE TABLE messages( id INTEGER PRIMARY KEY AUTOINCREMENT, data BLOB )" - added autoincremented 'id' PK column

FolderStorage

  • as implementation of KMFolderIndex::indexLocation() is moved to FolderStorage (before FolderStorage only had it as abstract method); FolderStorage::idsLocation() and FolderStorage::sortedLocation() are added to avoid performing the math like mFolder->indexLocation() + ".sorted"
  • added QString location(const QString& suffix) - returns full path to .index, .ids or .sorted file (depending on the suffix)

KMMsgBase, KMMsgInfo

api docs

  • 2008-04-23..25
    • added char* mData for SQLite mode only (and a getter/setter)
    • getStringPart() and getLongPart() share code between modes now: only cosmetic changes applied
  • 2008-05-02
    • 'sqlite_int64 dbId' member added to KMMsgBase; KMMsgBase and KMMsgInfo ctors in SQLite mode now differ from mmap mode
    • KMMsgBase::syncIndexString() completely removed in SQLite mode because we want to keep all SQLite-specific operations in kmfolderindex_sqlite.cpp; KMFolderIndex::updateIndex() does this now.

KMFolderDir

  • reload(): skip *.index.db files

Other classes

  • 2008-05-02
    • MessageProperty: use ConstIterators for QMap structures to avoid double lookups

Status of porting to SQLite

TOPIC PORTED TESTED NOTES
QString KMFolderIndex::indexLocation() yes added .db suffix to indicate the index is sqlite-based
int KMFolderIndex::updateIndex() yes implemented using extended version of writeMessages() - with UpdateExistingMessages mode; still calls writeIndex() on writeMessages() failure and still (properly) does nothing if mDirty == false.
int KMFolderIndex::writeIndex( bool createEmptyIndex ) yes creates db; creates tables messages table, insert messages to messages table, encoded as blobs using KMMsgBase::asIndexString(); header is not needed, but INDEX_VERSION is saved using PRAGMA user_version; byte order info is not saved: every integer is written using network order or as string
bool KMFolderIndex::readIndex() yes "SELECT id, data FROM messages" is called
int KMFolderIndex::count(bool cache) yes no changes
bool KMFolderIndex::readIndexHeader(int *gv)
bool KMFolderIndex::updateIndexStreamPtr(bool) removed removed
KMFolderIndex::IndexStatus KMFolderIndex::indexStatus() yes no changes so far; we have problems with trash folder being regenerated due to mtime of its .db file
void KMFolderIndex::truncateIndex() yes yes recreate the db
void KMFolderIndex::fillMessageDict() yes no change as it just inserts messages from KMFolderIndex::mMsgList into KMMsgDict
KMMsgInfo* KMFolderIndex::setIndexEntry( int idx, KMMessage *msg ) yes no change as it just sets creates a new KMMsgInfo object and inserts it into KMFolderIndex::mMsgList
bool KMFolderIndex::recreateIndex() yes no changes as it just calls createIndexFromContents() and readIndex()
off_t KMFolderIndex::mHeaderOffset replace its public use (e.g. in KMFolderIndex::truncateIndex()) with additional bool indexOpened()
FILE* KMFolderIndex::mIndexStream, uchar* mIndexStreamPtr, size_t mIndexStreamPtrLength, bool mIndexSwapByteOrder, int mIndexSizeOfLong these members are unused for SQLite mode because are related to file strorage; moreover byte order and size of long is handled by SQLite in a portable way
bool KMMsgBase::syncIndexString() const removed removed completely removed in SQLite mode because we want to keep all SQLite-specific operations in kmfolderindex_sqlite.cpp; KMFolderIndex::updateIndex() does this now using SQL commands
QString KMMsgBase::getStringPart(MsgPartType t) const yes yes data from mData member is returned and mIndexLength value is used; then KMFolderIndex uses SQL commands to save the date when needed

Important Commits

  • 802868 - partially working implementation (Wed Apr 30 22:20:39 2008 UTC)
  • 803451 - 'messages' table creation is now: "CREATE TABLE messages( id INTEGER PRIMARY KEY AUTOINCREMENT, data BLOB )" - added autoincremented 'id' PK column; updated all the code for this design; the value of ID column is now used for updating the dirty records (Fri May 2 22:12:35 2008 UTC)
  • 804186 - KMFolderIndex::updateIndex() - fix result checking of writeMessages() -fetching emails from (d)imap seems to work smoothly now (Mon May 5 09:51:46 2008 UTC)
  • 805075 - changes related to the SQLite mode for KMail indices from /branches/work/kmail-nommap (r799390..804487) /branches/work/kdepim-nommap/kmail (r804484..804960) merged into trunk

Open Questions

  • should we port .sorted and .ids files to sqlite too? (possibly to the same .db files)