User:Unormal: Difference between revisions

From KDE Community Wiki
(NLP in KDE)
 
(19 intermediate revisions by the same user not shown)
Line 1: Line 1:
= NLP - Natural Language Programming/Processing in KDE =
= NLP - Natural Language Programming/Processing in KDE =


* Jovie/KTTS - KDE Text-To-Speech
* [http://accessibility.kde.org/ Jovie/KTTS - KDE Text-To-Speech]
* Sonnet (Spell checking, etc.)
* [http://techbase.kde.org/Development/Architecture/KDE4/Sonnet Sonnet (Spell checking, etc.)]
* Simon Listens - Speech Recognition
* [http://www.simon-listens.org Simon Listens - Speech Recognition]
* KDEedu (Parley, KWordQuiz, KHangman, etc.)
* [http://edu.kde.org KDEedu (Parley, KWordQuiz, KHangman, etc.)]
* KMail ("attachment" recognition)
* [http://www.kontact.org KMail ("attachment" recognition)]
 
== Theory ==
 
In NLP we've the following tasks to do:
 
* Look up - look up in a directionary (to find an antonym or synonym or definition)
* Machine translation - translate a text or word from one language to another
* Parsing - extract specific information from a word or text
* Part-of-Speech-Tagging - Search to corresponding part-of-speech (pos) tag for a word (there are different pos tag sets)
* Segmentation/Tokenization - Split the text or sentence by words or sentences
* Spell checking - check to correct writing of a word or text
* Stemming - Extract the stem of a word
 


== Free Linguistic software tools and framework ==
== Free Linguistic software tools and framework ==


* Free Ling
{| class="nlptoolstable" border="1" cellpadding="5" cellspacing="0" style="border: gray solid 1px; border-collapse: collapse; text-align: left; width: 100%;"
* Link Grammar
|- style="background: #ececec; white-space:nowrap;"
* Morphisto
!Tool
* Malaga
!Supported Languages
* Aspell
!Type
* Ispell
!Version
* MySpell
!Programming language
* HunSpell
!License
* TreeTagger/TreeChunker (Stuttgart)
!Notes
|-
|[http://www.apertium.org/ Apertium]
|many
|machine translation platform
|3.1
|
|GPL
|
|-
|[http://aspell.net/ Aspell]
|many ;-)
|spell checker
|0.61
|
|LGPL
|successor of Ispell
|-
|[http://abisource.com/projects/enchant/ Enchant]
|many
|Spell checker
|1.6.0
|
|
|Spell checker for Abiword
|-
|[http://nlp.lsi.upc.edu/freeling/ FreeLing]
|Spanish, Catalan, Galician, Italian, English, Welsh, Portuguese, and Asturian
|suite of language analyzers
|2.2
|
|GPL
|
|-
|[http://lcl.uniroma1.it/babelnet/ BabelNet]
|English, Catalan, French, German, Italian and Spanish
|A very large multilingual semantic network
|1.0
|Java
|Creative Commons Attribution-Noncommercial-Share Alike 3.0
|
|-
|[http://langtech.jrc.ec.europa.eu/DGT-TM.html DGT-TM]
|Bulgarian, Czech, Danish, Dutch, English, Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish and Swedish.
|A freely available large-scale translation memory in 22 languages
|
|Java
|EUPL
|
|-
|[http://ilk.uvt.nl/tadpole frog]
|Dutch
|tagger and parser
|0.1
|C++, Python
|GPLv3
|
|-
|[http://hspell.ivrix.org.il/ hspell]
|Hebrew
|spell checker (and morphological analyzer)
|
|
|GPL
|
|-
|[http://mokk.bme.hu/resources/hunmorph hunmorph]
|
|morphological analyer
|
|
|
|More nlp tools at this page
|-
|[http://code.google.com/p/hunpos/ hunpos]
|
|tagger
|1.0
|OCaml
|BSD
|
|-
|[http://hunspell.sourceforge.net HunSpell]
|many
|spell checker and morphological analyzer
|1.2.12
|C, C++
|LGPL & MPL
|Spell checker of OOo
|-
|[http://lasr.cs.ucla.edu/geoff/ispell.html Ispell]
|large number of European languages
|spell checker
|3.3.02
|
|unknown
|Probably deprecated
|-
|[http://www.languagetool.org/ LanguageTool]
|many
|style and grammar proofreading software
|1.8
|Java
|LGPL
|
|-
|[http://search.cpan.org/dist/Lingua-EN-Tagger/ liblingua-tagger]
|English
|tagger
|0.16
|Perl
|Unknown
|Liblingua in Perl's CPAN module provides more tagger and stemmers
|-
|[http://www.link.cs.cmu.edu/link/ LinkGrammar]
|English
|syntactic parser
|4.1b
|
|GPL
|
|-
|[http://www.abisource.com/projects/link-grammar/ Link Grammar Parsre]
|English and more
|syntactic parser
|4.7.6
|Probably C
|GPL
|
|-
|[http://home.arcor.de/bjoern-beutel/malaga/ Malaga]
|German, Italian, Spanish, Suomi (not all free!)
|grammar development environment
|7.12
|
|GPL
|
|-
|[http://ilk.uvt.nl/mbt mbt]
|
|Memory-based tagger-generator and tagger
|3.2.2
|C++
|GPLv3
|
|-
|[http://code.google.com/p/morphisto/ Morphisto]
|German
|morphological analyzer
|
|
|LGPL & CC
|
|-
|MySpell
|many
|spell checker
|
|
|
|Former spell checker of OOo, now deprecated
|-
|[http://www.nltk.org/ nltk]
|
|Natural Language ToolKit
|
|Python
|
|
|-
|[http://incubator.apache.org/opennlp/ OpenNLP]
|unknown
|NLP software collection at Apache
|
|Java
|Apache License
|
|-
|[http://nlp.stanford.edu/software/tagger.shtml Stanford Log-linear Part-Of-Speech Tagger]
|
|tagger
|
|Java
|GPL
|
|-
|[http://ilk.uvt.nl/timbl/ TiMBL]
|
|Tilburg Memory Based Learner
|1.0.0
|C++
|GPLv3
|
|-
|[http://snowball.tartarus.org/index.php Snowball]
|many
|stemmer library
|
|C, Java, Python
|BSD
|
|-
|[http://www.lsi.upc.edu/~nlp/SVMTool/ SVM]
|English, Catalan, Spanish
|An Open Source generator of sequential taggers
|1.3.2
|Perl
|LGPL
|
|-
|[http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html TreeTagger]
|many
|PoSTagger & lemmatizer
|
|
|Tagger License
|
|-
|}
 
[http://www-nlp.stanford.edu/links/statnlp.html Standford list of NLP tools]
 
== Free text to speech tools ==
{| class="nlptoolstable" border="1" cellpadding="5" cellspacing="0" style="border: gray solid 1px; border-collapse: collapse; text-align: left; width: 100%;"
|- style="background: #ececec; white-space:nowrap;"
!Tool
!Supported Languages
!Type
!Version
!Programming language
!License
!Notes
|-
|[http://www.cstr.ed.ac.uk/projects/festival/ Festival]
|
|
|
|
|
|
|-
|[http://tcts.fpms.ac.be/synthesis/ MBrola]
|
|
|
|
|
|
|-
|
|
|
|
|
|
|
|-
|}


=== Additional tools ===
=== Additional tools ===


* Foma
* [http://foma.sourceforge.net/ Foma] - a finite-state machine toolkit and library
* SFST - Stuttgart Finite State Transducer
* [http://www.ims.uni-stuttgart.de/projekte/gramotron/SOFTWARE/SFST.html SFST - Stuttgart Finite State Transducer] - a toolbox for the implementation of morphological analysers and other tools which are based on finite state transducer technology
 
== Semantics and Co ==
 
* [http://www.lexinfo.net/ LexInfo builds on the lemon model to represent lexical information attached to ontologies on the semantic web]
* [http://linguistics-ontology.org/ GOLD is an ontology for descriptive linguistics]


=== Futher stuff ===
=== Futher stuff ===
* LIMA
* LIMA

Latest revision as of 21:31, 12 July 2012

NLP - Natural Language Programming/Processing in KDE

Theory

In NLP we've the following tasks to do:

  • Look up - look up in a directionary (to find an antonym or synonym or definition)
  • Machine translation - translate a text or word from one language to another
  • Parsing - extract specific information from a word or text
  • Part-of-Speech-Tagging - Search to corresponding part-of-speech (pos) tag for a word (there are different pos tag sets)
  • Segmentation/Tokenization - Split the text or sentence by words or sentences
  • Spell checking - check to correct writing of a word or text
  • Stemming - Extract the stem of a word


Free Linguistic software tools and framework

Tool Supported Languages Type Version Programming language License Notes
Apertium many machine translation platform 3.1 GPL
Aspell many ;-) spell checker 0.61 LGPL successor of Ispell
Enchant many Spell checker 1.6.0 Spell checker for Abiword
FreeLing Spanish, Catalan, Galician, Italian, English, Welsh, Portuguese, and Asturian suite of language analyzers 2.2 GPL
BabelNet English, Catalan, French, German, Italian and Spanish A very large multilingual semantic network 1.0 Java Creative Commons Attribution-Noncommercial-Share Alike 3.0
DGT-TM Bulgarian, Czech, Danish, Dutch, English, Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish and Swedish. A freely available large-scale translation memory in 22 languages Java EUPL
frog Dutch tagger and parser 0.1 C++, Python GPLv3
hspell Hebrew spell checker (and morphological analyzer) GPL
hunmorph morphological analyer More nlp tools at this page
hunpos tagger 1.0 OCaml BSD
HunSpell many spell checker and morphological analyzer 1.2.12 C, C++ LGPL & MPL Spell checker of OOo
Ispell large number of European languages spell checker 3.3.02 unknown Probably deprecated
LanguageTool many style and grammar proofreading software 1.8 Java LGPL
liblingua-tagger English tagger 0.16 Perl Unknown Liblingua in Perl's CPAN module provides more tagger and stemmers
LinkGrammar English syntactic parser 4.1b GPL
Link Grammar Parsre English and more syntactic parser 4.7.6 Probably C GPL
Malaga German, Italian, Spanish, Suomi (not all free!) grammar development environment 7.12 GPL
mbt Memory-based tagger-generator and tagger 3.2.2 C++ GPLv3
Morphisto German morphological analyzer LGPL & CC
MySpell many spell checker Former spell checker of OOo, now deprecated
nltk Natural Language ToolKit Python
OpenNLP unknown NLP software collection at Apache Java Apache License
Stanford Log-linear Part-Of-Speech Tagger tagger Java GPL
TiMBL Tilburg Memory Based Learner 1.0.0 C++ GPLv3
Snowball many stemmer library C, Java, Python BSD
SVM English, Catalan, Spanish An Open Source generator of sequential taggers 1.3.2 Perl LGPL
TreeTagger many PoSTagger & lemmatizer Tagger License

Standford list of NLP tools

Free text to speech tools

Tool Supported Languages Type Version Programming language License Notes
Festival
MBrola

Additional tools

  • Foma - a finite-state machine toolkit and library
  • SFST - Stuttgart Finite State Transducer - a toolbox for the implementation of morphological analysers and other tools which are based on finite state transducer technology

Semantics and Co

Futher stuff

  • LIMA