Jump to content

User:Unormal: Difference between revisions

From KDE Community Wiki
Unormal (talk | contribs)
Link to sfst page
Unormal (talk | contribs)
 
(10 intermediate revisions by the same user not shown)
Line 6: Line 6:
* [http://edu.kde.org KDEedu (Parley, KWordQuiz, KHangman, etc.)]
* [http://edu.kde.org KDEedu (Parley, KWordQuiz, KHangman, etc.)]
* [http://www.kontact.org KMail ("attachment" recognition)]
* [http://www.kontact.org KMail ("attachment" recognition)]
== Theory ==
In NLP we've the following tasks to do:
* Look up - look up in a directionary (to find an antonym or synonym or definition)
* Machine translation - translate a text or word from one language to another
* Parsing - extract specific information from a word or text
* Part-of-Speech-Tagging - Search to corresponding part-of-speech (pos) tag for a word (there are different pos tag sets)
* Segmentation/Tokenization - Split the text or sentence by words or sentences
* Spell checking - check to correct writing of a word or text
* Stemming - Extract the stem of a word


== Free Linguistic software tools and framework ==
== Free Linguistic software tools and framework ==
Line 49: Line 62:
|
|
|GPL
|GPL
|
|-
|[http://lcl.uniroma1.it/babelnet/ BabelNet]
|English, Catalan, French, German, Italian and Spanish
|A very large multilingual semantic network
|1.0
|Java
|Creative Commons Attribution-Noncommercial-Share Alike 3.0
|
|-
|[http://langtech.jrc.ec.europa.eu/DGT-TM.html DGT-TM]
|Bulgarian, Czech, Danish, Dutch, English, Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish and Swedish.
|A freely available large-scale translation memory in 22 languages
|
|Java
|EUPL
|
|
|-
|-
Line 57: Line 86:
|C++, Python
|C++, Python
|GPLv3
|GPLv3
|
|-
|[http://hspell.ivrix.org.il/ hspell]
|Hebrew
|spell checker (and morphological analyzer)
|
|
|GPL
|
|-
|[http://mokk.bme.hu/resources/hunmorph hunmorph]
|
|morphological analyer
|
|
|
|More nlp tools at this page
|-
|[http://code.google.com/p/hunpos/ hunpos]
|
|tagger
|1.0
|OCaml
|BSD
|
|
|-
|-
Line 74: Line 127:
|unknown
|unknown
|Probably deprecated
|Probably deprecated
|-
|[http://www.languagetool.org/ LanguageTool]
|many
|style and grammar proofreading software
|1.8
|Java
|LGPL
|
|-
|-
|[http://search.cpan.org/dist/Lingua-EN-Tagger/ liblingua-tagger]
|[http://search.cpan.org/dist/Lingua-EN-Tagger/ liblingua-tagger]
Line 88: Line 149:
|4.1b
|4.1b
|
|
|GPL
|
|-
|[http://www.abisource.com/projects/link-grammar/ Link Grammar Parsre]
|English and more
|syntactic parser
|4.7.6
|Probably C
|GPL
|GPL
|
|
Line 122: Line 191:
|
|
|Former spell checker of OOo, now deprecated
|Former spell checker of OOo, now deprecated
|-
|[http://www.nltk.org/ nltk]
|
|Natural Language ToolKit
|
|Python
|
|
|-
|[http://incubator.apache.org/opennlp/ OpenNLP]
|unknown
|NLP software collection at Apache
|
|Java
|Apache License
|
|-
|[http://nlp.stanford.edu/software/tagger.shtml Stanford Log-linear Part-Of-Speech Tagger]
|
|tagger
|
|Java
|GPL
|
|-
|-
|[http://ilk.uvt.nl/timbl/ TiMBL]
|[http://ilk.uvt.nl/timbl/ TiMBL]
Line 137: Line 230:
|C, Java, Python
|C, Java, Python
|BSD
|BSD
|
|-
|[http://www.lsi.upc.edu/~nlp/SVMTool/ SVM]
|English, Catalan, Spanish
|An Open Source generator of sequential taggers
|1.3.2
|Perl
|LGPL
|
|
|-
|-
Line 148: Line 249:
|-
|-
|}
|}
[http://www-nlp.stanford.edu/links/statnlp.html Standford list of NLP tools]


== Free text to speech tools ==
== Free text to speech tools ==
Line 160: Line 263:
!Notes
!Notes
|-
|-
|Festival
|[http://www.cstr.ed.ac.uk/projects/festival/ Festival]
|
|
|
|
Line 168: Line 271:
|
|
|-
|-
|MBrola
|[http://tcts.fpms.ac.be/synthesis/ MBrola]
|
|
|
|
Line 190: Line 293:
* [http://foma.sourceforge.net/ Foma] - a finite-state machine toolkit and library
* [http://foma.sourceforge.net/ Foma] - a finite-state machine toolkit and library
* [http://www.ims.uni-stuttgart.de/projekte/gramotron/SOFTWARE/SFST.html SFST - Stuttgart Finite State Transducer] - a toolbox for the implementation of morphological analysers and other tools which are based on finite state transducer technology
* [http://www.ims.uni-stuttgart.de/projekte/gramotron/SOFTWARE/SFST.html SFST - Stuttgart Finite State Transducer] - a toolbox for the implementation of morphological analysers and other tools which are based on finite state transducer technology
== Semantics and Co ==
* [http://www.lexinfo.net/ LexInfo builds on the lemon model to represent lexical information attached to ontologies on the semantic web]
* [http://linguistics-ontology.org/ GOLD is an ontology for descriptive linguistics]


=== Futher stuff ===
=== Futher stuff ===
* LIMA
* LIMA

Latest revision as of 21:31, 12 July 2012

NLP - Natural Language Programming/Processing in KDE

Theory

In NLP we've the following tasks to do:

  • Look up - look up in a directionary (to find an antonym or synonym or definition)
  • Machine translation - translate a text or word from one language to another
  • Parsing - extract specific information from a word or text
  • Part-of-Speech-Tagging - Search to corresponding part-of-speech (pos) tag for a word (there are different pos tag sets)
  • Segmentation/Tokenization - Split the text or sentence by words or sentences
  • Spell checking - check to correct writing of a word or text
  • Stemming - Extract the stem of a word


Free Linguistic software tools and framework

Tool Supported Languages Type Version Programming language License Notes
Apertium many machine translation platform 3.1 GPL
Aspell many ;-) spell checker 0.61 LGPL successor of Ispell
Enchant many Spell checker 1.6.0 Spell checker for Abiword
FreeLing Spanish, Catalan, Galician, Italian, English, Welsh, Portuguese, and Asturian suite of language analyzers 2.2 GPL
BabelNet English, Catalan, French, German, Italian and Spanish A very large multilingual semantic network 1.0 Java Creative Commons Attribution-Noncommercial-Share Alike 3.0
DGT-TM Bulgarian, Czech, Danish, Dutch, English, Estonian, German, Greek, Finnish, French, Hungarian, Italian, Latvian, Lithuanian, Maltese, Polish, Portuguese, Romanian, Slovak, Slovene, Spanish and Swedish. A freely available large-scale translation memory in 22 languages Java EUPL
frog Dutch tagger and parser 0.1 C++, Python GPLv3
hspell Hebrew spell checker (and morphological analyzer) GPL
hunmorph morphological analyer More nlp tools at this page
hunpos tagger 1.0 OCaml BSD
HunSpell many spell checker and morphological analyzer 1.2.12 C, C++ LGPL & MPL Spell checker of OOo
Ispell large number of European languages spell checker 3.3.02 unknown Probably deprecated
LanguageTool many style and grammar proofreading software 1.8 Java LGPL
liblingua-tagger English tagger 0.16 Perl Unknown Liblingua in Perl's CPAN module provides more tagger and stemmers
LinkGrammar English syntactic parser 4.1b GPL
Link Grammar Parsre English and more syntactic parser 4.7.6 Probably C GPL
Malaga German, Italian, Spanish, Suomi (not all free!) grammar development environment 7.12 GPL
mbt Memory-based tagger-generator and tagger 3.2.2 C++ GPLv3
Morphisto German morphological analyzer LGPL & CC
MySpell many spell checker Former spell checker of OOo, now deprecated
nltk Natural Language ToolKit Python
OpenNLP unknown NLP software collection at Apache Java Apache License
Stanford Log-linear Part-Of-Speech Tagger tagger Java GPL
TiMBL Tilburg Memory Based Learner 1.0.0 C++ GPLv3
Snowball many stemmer library C, Java, Python BSD
SVM English, Catalan, Spanish An Open Source generator of sequential taggers 1.3.2 Perl LGPL
TreeTagger many PoSTagger & lemmatizer Tagger License

Standford list of NLP tools

Free text to speech tools

Tool Supported Languages Type Version Programming language License Notes
Festival
MBrola

Additional tools

  • Foma - a finite-state machine toolkit and library
  • SFST - Stuttgart Finite State Transducer - a toolbox for the implementation of morphological analysers and other tools which are based on finite state transducer technology

Semantics and Co

Futher stuff

  • LIMA