KDEEdu/Language/KVocDocumentPlanningJuly2014Temporary: Difference between revisions

From KDE Community Wiki
(→‎KVocDocument Planning (July 2014) Temporary: some initial notes about Artikulate)
m (Removed comment about listing all used tags)
 
(5 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{Construction}}
{{Construction}}
= KVocDocument Planning (July 2014) Temporary =


== Individual Requirements ==
== Individual Requirements ==
Line 7: Line 6:


Each application is divided into File Format, API and Editor requirements.
Each application is divided into File Format, API and Editor requirements.
For current File Format requirements I listed the tags from the http://edu.kde.org/kvtml/kvtml2.dtd http://edu.kde.org/kvtml/kvtml2.dtd] I knew to be used.
Editor requirements are to explore the possibility of a common editor widget.
Editor requirements are to explore the possibility of a common editor widget.


Line 13: Line 11:
The current requirements are to determine what portions of KVTML2, KEduVocDocument and its associated API are not in use.  Future requirements are each application's wishes for the future.
The current requirements are to determine what portions of KVTML2, KEduVocDocument and its associated API are not in use.  Future requirements are each application's wishes for the future.


 
=== Kanagram ===
 
=== KAnagram ===
==== File Format ====
==== File Format ====
===== Current =====
===== Current =====
Line 48: Line 44:


=== Parley ===
=== Parley ===
==== File Format ====
==== File Format Requirements ====
===== Current =====
===== Current Requirements  of KVTML2 =====
====== Used Tags ======
# Header Information (generator, title, author comment)
Parley uses almost every tag that is parsed from the kvtml2 file so this section is not very useful
# Two or more languages
 
# information.generator
# information.title
# information.author
# information.comment
# identifiers - 2 or more
#
# indentifier.name
# indentifier.locale
# identifier.comment
# indentifier.article
# indentifier.article.definite
# indentifier.article.indefinite
# indentifier.personalpronouns
# indentifier.personalpronouns.singular
# indentifier.personalpronouns.dual
# indentifier.personalpronouns.plural
# indentifier.personalpronouns.tense
# firstperson
# secondperson
# thirdpersonmale
# thirdpersonfemale
# thirdpersonneutralcommon
#
# tenses
# tense
#
# lessons
# wordtypes
#
# specialwordtype hardcoded as  (noun|noun/male|noun/female|noun/neutral|verb|adjective|adverb)
#
# inpractice
#
# entry
# entries
#
# translation
# text
# comment
# pronunciation
# example
# paraphrase
#
# falsefriend
# antonym
# synonym
# multiplechoice
#
# image
# sound
#
# comparison
# absolute
# comparative
# superlative
#
# conjugation
# tense
# singular
# dual
# plural
# choice
#
# grade
# currentgrade
# count
# errorcount
# date
#
# containerentry is used indirectly via inheritence of lessons and wordtype so it is not a requirement
 
 
====== Features ======
# Header Information
# Per language identifier information to setup locale, articles (definite and indefinite hardcoded) and pronouns (first, second and third person, single, dual and plural hardcoded)
# Per language identifier information to setup locale, articles (definite and indefinite hardcoded) and pronouns (first, second and third person, single, dual and plural hardcoded)
# A list of tenses
# A list of tenses
# Two nesting containers: word type and lessons
# Two nesting containers: word type and lessons
# A marker for special hardcoded wordtypes
# A marker for special hardcoded wordtypes, identifying parts of speech tied to methods/games
# Entries with up to 1 translation per language identifier
# Entries with up to 1 translation per language identifier
# Each translation can have an image, a sound and several types of text attachments
# Each translation can have an image, a sound and several types of text attachments
Line 137: Line 58:
# Each translation can have a grade consisting of (currentgrade, count, errorcount, date)
# Each translation can have a grade consisting of (currentgrade, count, errorcount, date)


:Currently Parley uses almost every feature and tag provided by KVTML2. Here is a complete list of the [[KDEEdu/Language/KVocDocumentPlanningJuly2014Temporary/ParleyCurrentTags|tags used by]] Parley.
===== Future Requirements Different from KVTML2 =====
====== Container ======
# container format with separate sections for
## one or more dictionaries of words
## zero or more collections of word sets and relationships between sets
## zero or more per user goals (which lessons are active, how are word chosen etc.)
## one or more unit/lesson plans
## per user assessment/data/statistics
# per user/per tool goals and assessment are stored separately
# Is grammar and word set structure per user or global?
# Is the unit/lesson plan per user or global?
# ids for all objects are alphanumeric so that they can be
## human meaningful, guessable and hackable
## stable across files if a small number of words/lessons/grades are changed.
# namespaces for word, wordsets, units, relationships, and constructedRelationships are separate so their names can overlap.
====== Dictionary of Words ======
# use a recognized standard (like DICT or XDXF or both) to gain access to millions of words
# dictionary is per language
# primarylistseparator - The primary character to use to separate lists of words for this language. Defaults to ','.
# secondarylistseparator - The secondary character to use to separate lists of words.  It can partition list that include the primary character. Defaults to ';'
# whitespacechar - White space character. Defaults to ' '. Can have multiples.
# ignoreextrawhitespace - Is extra white space ignored in this language
# nullinputchar - Character that represents blank input, in case the language uses '' as a meaningful input.
# font -
# locale
====== Grammar and Word Set Structure ======
# remove hardcoded grammar falsefriend, antonym, synonym conjugation comparison etc.
# rename wordtypes to wordsets to imply arbitrary sets of words, not types of speech
# relationships described by relationship(name, mapping?,  relatee, relations) are the arbitrary relationship (name) of mapping type (one to many, one to one, or many to many) of the relatee(s) to the relation(s). For example:
#* relationship(translation, dog, {Hund})
#* relationship(synonyms, ManyToMany {street, road, avenue, boulevard, way}, {street, road, avenue, boulevard, way})
#* relationship(translation, OneToOne {dog, cat}, {Hund, Katze})
#* relationship(wives of, Henry VIII, {Catherine of Aragon, Anne Boleyn, Jane Seymour, Anne of Cleves, Catherine Howard, Catherine Parr, Elizabeth Blount})
# constructed relationships described by constructedrelationships(name, type=basic, list of sets, list of results), construct on demand all of the relationships and sets from the product (choosing one item from each set) one to one mapped to the results.  For example:
#* constructedrelationships( conjugation, basic, [{Present, Past}, {Singular, Plural}, {First Person, Second Person, Third Person}, {to be, to have}], { put your conjugations here})
#* generates relationship(conjugation,Past, to be), {all past tenses of to be})
#* relationship( conjugation,(Present,FirstPerson), {all Present First Person conjgations of both to be and to have})
#* relations(conjugation, (Present, Singular, First Person, to be), I am)
#* etc.
#constructed relationships described by constructedrelationships(name, type=regex, list of sets, pattern, replacement) as above constructs relationships, but is uses a regular expression to generalize the result. For example the following 4 regular expressions generate all regular conjugations of English past, present and future tenses ( assuming the 4 parts are concatenated with ";") :
#* constructedrelationships( conjugation, regex, [{Present}, {Singular, Plural}, {First Person, Second Person}, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)",  "\1 \2")
#* constructedrelationships( conjugation, regex, [{Present}, {Singular}, {Third Person}, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)",  "\1 \2s")
#* constructedrelationships( conjugation, regex, [{Present}, {Plural}, {Third Person}, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)",  "\1 \2")
#* constructedrelationships( conjugation, regex, [{Past}, {Singular, Plural}, Person, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)",  "\1 \2ed")
#* constructedrelationships( conjugation, regex, [{Future}, {Singular, Plural}, Person, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)",  "\1 will \2")
# add cue, target to mark text or media as either a cue, a target, both a cue and a target.  Targets are answers and cues are questions. If nothing is marked, assume everything is a cue and a target.
# remove comment, pronunciation, example, paraphrase
## add informational to mark text or media as neither a target nor a cue
====== Unit Plans ======
# remove hard coded method/game/grammar tag specialwordtype
# usemethod defined by usemethod(target name, set of methods) restricts these targets to using only these methods.  Otherwise Parley guesses as follows:
#* If a target has a one to one, or a one to small number of many,  it is suitable for flash cards and multiple choice
#* If a target is written it is suitable for anagram and written.
#* Any constructed relationship is suitable for the comparison/conjugation tool
====== Per Unit Plan ======
# activemethod/game(method, isactive) - filters the methods active with this lesson
# methodthreshold(method, recognition/production/spelling, low, high)  - only use this method if the students recognition/production/spelling score is above low and less than high
# useconstruction(method, recognition/production/spelling, low, high) - only use constructions when the student's recognition/production/spelling falls in the range.  The idea is that the student will not be asked to conjugate "I" with "to walk", until their recognition is above low percent and then would not be asked to use I walked in a sentence with the preposition "to" and the noun phrase "the store" until all the parts are at low percent.
====== Student Goals ======
# activelesson(lesson id, isactive) activates/deactivates a lesson and all children
# activemethod/game(method, isactive)
====== Student Assessment ======
Store data and not statistics.  Generate the statistics (Leitner boxes) on demand.
# remove grade and replace with
## Add assessment(recognition, word id, issuccess, timestamp, incorrect word) to track correctly recognized words
## Add assessment(production, word id, issuccess, timestamp, incorrect word) to track correctly produced words
## Add assessment(spelling, word id, issuccess, timestamp, incorrect word) to track correctly spelled words


===== Future =====
# ids are alphanumeric so they can be a) human meaningful b) stable if words/lessons/grades are in different locations.
#
==== API ====
==== API ====
===== Current =====
===== Current =====
Line 171: Line 158:


== Editor ==
== Editor ==
== Further Reading ==
The following projects can be interesting for designing a container file format for language learning files:
* http://en.wikipedia.org/wiki/DICT
* http://en.wikipedia.org/wiki/XDXF
* http://en.wikipedia.org/wiki/Open_Packaging_Conventions

Latest revision as of 16:44, 10 July 2014

 
Under Construction
This is a new page, currently under construction!


Individual Requirements

The following section is for planning the requirements of a replacement for KVTML2. It is divided by application.

Each application is divided into File Format, API and Editor requirements. Editor requirements are to explore the possibility of a common editor widget.

Each of those is divided into current and future requirements. The current requirements are to determine what portions of KVTML2, KEduVocDocument and its associated API are not in use. Future requirements are each application's wishes for the future.

Kanagram

File Format

Current
Future

API

Current
Future

Editor

Current
Future

Artikulate

Artikulate currently uses its own XML based file format, but the mid-term plan is to switch to a common format. The specification for the currently used file format is here:

File Format Requirements

Requirements
  • association of file with one language (the language for that the pronunciation should be trained)
  • string filed for text in training language
  • (optional) string field for text in learner's language/English + i18n integration
  • pronunciation symbols
  • one sound file per string
  • EITHER internal editing states OR special editing file format (like: a phrase is translated into the course' language, but a recording is missing)
  • association to blueprint/skeleton file
Key Differences to KVTML
  • the file format provides a skeleton specification: skeletons are blueprint like files that can be used to create and later synchronize changes for courses of different languages
  • learning statistics are not saved within the file format: there is a learner-library that encapsulates learner, learning goals, and the corresponding statistics data
  • downloaded course files are not meant to be changed/edited, but to be updated (in particular, system wide installation is provided)

Editor

Current
Future

Parley

File Format Requirements

Current Requirements of KVTML2
  1. Header Information (generator, title, author comment)
  2. Two or more languages
  3. Per language identifier information to setup locale, articles (definite and indefinite hardcoded) and pronouns (first, second and third person, single, dual and plural hardcoded)
  4. A list of tenses
  5. Two nesting containers: word type and lessons
  6. A marker for special hardcoded wordtypes, identifying parts of speech tied to methods/games
  7. Entries with up to 1 translation per language identifier
  8. Each translation can have an image, a sound and several types of text attachments
  9. Each translation can have up to 5 special sets (synonym, antonym, false friends, multiple choice, comparison) attached
  10. Each translation can be a verb with an attached conjugation
  11. Each translation can have a grade consisting of (currentgrade, count, errorcount, date)
Currently Parley uses almost every feature and tag provided by KVTML2. Here is a complete list of the tags used by Parley.
Future Requirements Different from KVTML2
Container
  1. container format with separate sections for
    1. one or more dictionaries of words
    2. zero or more collections of word sets and relationships between sets
    3. zero or more per user goals (which lessons are active, how are word chosen etc.)
    4. one or more unit/lesson plans
    5. per user assessment/data/statistics
  2. per user/per tool goals and assessment are stored separately
  3. Is grammar and word set structure per user or global?
  4. Is the unit/lesson plan per user or global?
  5. ids for all objects are alphanumeric so that they can be
    1. human meaningful, guessable and hackable
    2. stable across files if a small number of words/lessons/grades are changed.
  6. namespaces for word, wordsets, units, relationships, and constructedRelationships are separate so their names can overlap.
Dictionary of Words
  1. use a recognized standard (like DICT or XDXF or both) to gain access to millions of words
  2. dictionary is per language
  3. primarylistseparator - The primary character to use to separate lists of words for this language. Defaults to ','.
  4. secondarylistseparator - The secondary character to use to separate lists of words. It can partition list that include the primary character. Defaults to ';'
  5. whitespacechar - White space character. Defaults to ' '. Can have multiples.
  6. ignoreextrawhitespace - Is extra white space ignored in this language
  7. nullinputchar - Character that represents blank input, in case the language uses as a meaningful input.
  8. font -
  9. locale
Grammar and Word Set Structure
  1. remove hardcoded grammar falsefriend, antonym, synonym conjugation comparison etc.
  2. rename wordtypes to wordsets to imply arbitrary sets of words, not types of speech
  3. relationships described by relationship(name, mapping?, relatee, relations) are the arbitrary relationship (name) of mapping type (one to many, one to one, or many to many) of the relatee(s) to the relation(s). For example:
    • relationship(translation, dog, {Hund})
    • relationship(synonyms, ManyToMany {street, road, avenue, boulevard, way}, {street, road, avenue, boulevard, way})
    • relationship(translation, OneToOne {dog, cat}, {Hund, Katze})
    • relationship(wives of, Henry VIII, {Catherine of Aragon, Anne Boleyn, Jane Seymour, Anne of Cleves, Catherine Howard, Catherine Parr, Elizabeth Blount})
  4. constructed relationships described by constructedrelationships(name, type=basic, list of sets, list of results), construct on demand all of the relationships and sets from the product (choosing one item from each set) one to one mapped to the results. For example:
    • constructedrelationships( conjugation, basic, [{Present, Past}, {Singular, Plural}, {First Person, Second Person, Third Person}, {to be, to have}], { put your conjugations here})
    • generates relationship(conjugation,Past, to be), {all past tenses of to be})
    • relationship( conjugation,(Present,FirstPerson), {all Present First Person conjgations of both to be and to have})
    • relations(conjugation, (Present, Singular, First Person, to be), I am)
    • etc.
  5. constructed relationships described by constructedrelationships(name, type=regex, list of sets, pattern, replacement) as above constructs relationships, but is uses a regular expression to generalize the result. For example the following 4 regular expressions generate all regular conjugations of English past, present and future tenses ( assuming the 4 parts are concatenated with ";") :
    • constructedrelationships( conjugation, regex, [{Present}, {Singular, Plural}, {First Person, Second Person}, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)", "\1 \2")
    • constructedrelationships( conjugation, regex, [{Present}, {Singular}, {Third Person}, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)", "\1 \2s")
    • constructedrelationships( conjugation, regex, [{Present}, {Plural}, {Third Person}, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)", "\1 \2")
    • constructedrelationships( conjugation, regex, [{Past}, {Singular, Plural}, Person, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)", "\1 \2ed")
    • constructedrelationships( conjugation, regex, [{Future}, {Singular, Plural}, Person, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)", "\1 will \2")
  6. add cue, target to mark text or media as either a cue, a target, both a cue and a target. Targets are answers and cues are questions. If nothing is marked, assume everything is a cue and a target.
  7. remove comment, pronunciation, example, paraphrase
    1. add informational to mark text or media as neither a target nor a cue
Unit Plans
  1. remove hard coded method/game/grammar tag specialwordtype
  2. usemethod defined by usemethod(target name, set of methods) restricts these targets to using only these methods. Otherwise Parley guesses as follows:
    • If a target has a one to one, or a one to small number of many, it is suitable for flash cards and multiple choice
    • If a target is written it is suitable for anagram and written.
    • Any constructed relationship is suitable for the comparison/conjugation tool
Per Unit Plan
  1. activemethod/game(method, isactive) - filters the methods active with this lesson
  2. methodthreshold(method, recognition/production/spelling, low, high) - only use this method if the students recognition/production/spelling score is above low and less than high
  3. useconstruction(method, recognition/production/spelling, low, high) - only use constructions when the student's recognition/production/spelling falls in the range. The idea is that the student will not be asked to conjugate "I" with "to walk", until their recognition is above low percent and then would not be asked to use I walked in a sentence with the preposition "to" and the noun phrase "the store" until all the parts are at low percent.
Student Goals
  1. activelesson(lesson id, isactive) activates/deactivates a lesson and all children
  2. activemethod/game(method, isactive)
Student Assessment

Store data and not statistics. Generate the statistics (Leitner boxes) on demand.

  1. remove grade and replace with
    1. Add assessment(recognition, word id, issuccess, timestamp, incorrect word) to track correctly recognized words
    2. Add assessment(production, word id, issuccess, timestamp, incorrect word) to track correctly produced words
    3. Add assessment(spelling, word id, issuccess, timestamp, incorrect word) to track correctly spelled words

API

Current
Future

Editor

Current
  1. supports 2 or more languages
  2. supports 1 root lesson
  3. supports nested lessons
Future

Currently Unused

These features of the current format appear to be unused

File Format

  1. information.category
  2. identifier.identifiertype - never parsed in kvocdoc
  3. identifier.sizehint - never parsed in kvocdoc
  4. entry.sizehint - never parsed in kvocdoc
  5. leitnerboxes - never used in Parley
  6. deactivated - never parsed in kvocdoc

API

Cumulative Requirements

File Format

API

Editor

Further Reading

The following projects can be interesting for designing a container file format for language learning files: