Sonnet is a spell checking library for Qt-based applications, with automatic language detection.
Generating trigram data files
To generate a trigram data file for a new language you first need a corpus for the language. One easy way to get this is to use Wikipedia dumps. Try using your favorite search engine to find information on how to generate a plain text corpus from Wikipedia.
Then you need to use the "gentrigrams" tool to generate compatible trigram files from this text corpus. It is available from here: http://quickgit.kde.org/?p=scratch%2Fsandsmark%2Fgentrigrams.git
Check it out, build it with "qmake && make", and then run it as so: "./gentrigrams ../path/to/corpus.txt languagecode", which will read in the corpus.txt file and spit out a file named "languagecode". This can then be copied into data/trigrams in the sonnet repository. The sonnet build system will automatically parse in the files in that directory and create a file that is easy and quick for sonnet to load.