User:Unormal

From KDE Community Wiki

NLP - Natural Language Programming/Processing in KDE

Theory

In NLP we've the following tasks to do:

  • Look up - look up in a directionary (to find an antonym or synonym or definition)
  • Machine translation - translate a text or word from one language to another
  • Parsing - extract specific information from a word or text
  • Part-of-Speech-Tagging - Search to corresponding part-of-speech (pos) tag for a word (there are different pos tag sets)
  • Segmentation/Tokenization - Split the text or sentence by words or sentences
  • Spell checking - check to correct writing of a word or text
  • Stemming - Extract the stem of a word


Free Linguistic software tools and framework

Tool Supported Languages Type Version Programming language License Notes
Apertium many machine translation platform 3.1 GPL
Aspell many ;-) spell checker 0.61 LGPL successor of Ispell
Enchant many Spell checker 1.6.0 Spell checker for Abiword
FreeLing Spanish, Catalan, Galician, Italian, English, Welsh, Portuguese, and Asturian suite of language analyzers 2.2 GPL
BabelNet English, Catalan, French, German, Italian and Spanish A very large multilingual semantic network 1.0 Java Creative Commons Attribution-Noncommercial-Share Alike 3.0
DGT-TM Bulgarian, Czech, Danish, Dutch, English, Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish and Swedish. A freely available large-scale translation memory in 22 languages Java EUPL
frog Dutch tagger and parser 0.1 C++, Python GPLv3
hspell Hebrew spell checker (and morphological analyzer) GPL
hunmorph morphological analyer More nlp tools at this page
hunpos tagger 1.0 OCaml BSD
HunSpell many spell checker and morphological analyzer 1.2.12 C, C++ LGPL & MPL Spell checker of OOo
Ispell large number of European languages spell checker 3.3.02 unknown Probably deprecated
LanguageTool many style and grammar proofreading software 1.8 Java LGPL
liblingua-tagger English tagger 0.16 Perl Unknown Liblingua in Perl's CPAN module provides more tagger and stemmers
LinkGrammar English syntactic parser 4.1b GPL
Link Grammar Parsre English and more syntactic parser 4.7.6 Probably C GPL
Malaga German, Italian, Spanish, Suomi (not all free!) grammar development environment 7.12 GPL
mbt Memory-based tagger-generator and tagger 3.2.2 C++ GPLv3
Morphisto German morphological analyzer LGPL & CC
MySpell many spell checker Former spell checker of OOo, now deprecated
nltk Natural Language ToolKit Python
OpenNLP unknown NLP software collection at Apache Java Apache License
Stanford Log-linear Part-Of-Speech Tagger tagger Java GPL
TiMBL Tilburg Memory Based Learner 1.0.0 C++ GPLv3
Snowball many stemmer library C, Java, Python BSD
SVM English, Catalan, Spanish An Open Source generator of sequential taggers 1.3.2 Perl LGPL
TreeTagger many PoSTagger & lemmatizer Tagger License

Standford list of NLP tools

Free text to speech tools

Tool Supported Languages Type Version Programming language License Notes
Festival
MBrola

Additional tools

  • Foma - a finite-state machine toolkit and library
  • SFST - Stuttgart Finite State Transducer - a toolbox for the implementation of morphological analysers and other tools which are based on finite state transducer technology

Semantics and Co

Futher stuff

  • LIMA