Sonnet is a spell checking library for Qt-based applications, with automatic language detection. Generating trigram data files To generate a trigram data file for a new language you first need a corpus for the language. One easy way to get this is to use Wikipedia dumps. Try using your favorite search engine to find information on how to generate a plain text corpus from Wikipedia. Then you need to use the "gentrigrams" tool to generate compatible trigram files from this text corpus. It is available from here: http://quickgit.kde.org/?p=scratch%2Fsandsmark%2Fgentrigrams.git Check it out, build it with "qmake && make", and then run it as so: "./gentrigrams ../path/to/corpus.txt languagecode", which will read in the corpus.txt file and spit out a file named "languagecode". This can then be copied into data/trigrams in the sonnet repository. The sonnet build system will automatically parse in the files in that directory and create a file that is easy and quick for sonnet to load. Retrieved from "https://community.kde.org/index.php?title=Sonnet&oldid=35798" This page was last edited on 29 December 2013, at 22:20. Content is available under Creative Commons License SA 4.0 unless otherwise noted.