KDEEdu/Language/KVocDocumentPlanningJuly2014Temporary: Difference between revisions

Latest revision as of 16:44, 10 July 2014

Under Construction
This is a new page, currently under construction!

Individual Requirements

The following section is for planning the requirements of a replacement for KVTML2. It is divided by application.

Each application is divided into File Format, API and Editor requirements. Editor requirements are to explore the possibility of a common editor widget.

Each of those is divided into current and future requirements. The current requirements are to determine what portions of KVTML2, KEduVocDocument and its associated API are not in use. Future requirements are each application's wishes for the future.

Kanagram

File Format

Current

Future

API

Current

Future

Editor

Current

Future

Artikulate

Artikulate currently uses its own XML based file format, but the mid-term plan is to switch to a common format. The specification for the currently used file format is here:

course-format specification

File Format Requirements

Requirements

association of file with one language (the language for that the pronunciation should be trained)
string filed for text in training language
(optional) string field for text in learner's language/English + i18n integration
pronunciation symbols
one sound file per string
EITHER internal editing states OR special editing file format (like: a phrase is translated into the course' language, but a recording is missing)
association to blueprint/skeleton file

Key Differences to KVTML

the file format provides a skeleton specification: skeletons are blueprint like files that can be used to create and later synchronize changes for courses of different languages
learning statistics are not saved within the file format: there is a learner-library that encapsulates learner, learning goals, and the corresponding statistics data
downloaded course files are not meant to be changed/edited, but to be updated (in particular, system wide installation is provided)

Editor

Current

Future

Parley

File Format Requirements

Current Requirements of KVTML2

Header Information (generator, title, author comment)
Two or more languages
Per language identifier information to setup locale, articles (definite and indefinite hardcoded) and pronouns (first, second and third person, single, dual and plural hardcoded)
A list of tenses
Two nesting containers: word type and lessons
A marker for special hardcoded wordtypes, identifying parts of speech tied to methods/games
Entries with up to 1 translation per language identifier
Each translation can have an image, a sound and several types of text attachments
Each translation can have up to 5 special sets (synonym, antonym, false friends, multiple choice, comparison) attached
Each translation can be a verb with an attached conjugation
Each translation can have a grade consisting of (currentgrade, count, errorcount, date)

Currently Parley uses almost every feature and tag provided by KVTML2. Here is a complete list of the tags used by Parley.

Future Requirements Different from KVTML2

Container

container format with separate sections for
1. one or more dictionaries of words
2. zero or more collections of word sets and relationships between sets
3. zero or more per user goals (which lessons are active, how are word chosen etc.)
4. one or more unit/lesson plans
5. per user assessment/data/statistics
per user/per tool goals and assessment are stored separately
Is grammar and word set structure per user or global?
Is the unit/lesson plan per user or global?
ids for all objects are alphanumeric so that they can be
1. human meaningful, guessable and hackable
2. stable across files if a small number of words/lessons/grades are changed.
namespaces for word, wordsets, units, relationships, and constructedRelationships are separate so their names can overlap.

Dictionary of Words

use a recognized standard (like DICT or XDXF or both) to gain access to millions of words
dictionary is per language
primarylistseparator - The primary character to use to separate lists of words for this language. Defaults to ','.
secondarylistseparator - The secondary character to use to separate lists of words. It can partition list that include the primary character. Defaults to ';'
whitespacechar - White space character. Defaults to ' '. Can have multiples.
ignoreextrawhitespace - Is extra white space ignored in this language
nullinputchar - Character that represents blank input, in case the language uses as a meaningful input.
font -
locale

Grammar and Word Set Structure

remove hardcoded grammar falsefriend, antonym, synonym conjugation comparison etc.
rename wordtypes to wordsets to imply arbitrary sets of words, not types of speech
relationships described by relationship(name, mapping?, relatee, relations) are the arbitrary relationship (name) of mapping type (one to many, one to one, or many to many) of the relatee(s) to the relation(s). For example:
- relationship(translation, dog, {Hund})
- relationship(synonyms, ManyToMany {street, road, avenue, boulevard, way}, {street, road, avenue, boulevard, way})
- relationship(translation, OneToOne {dog, cat}, {Hund, Katze})
- relationship(wives of, Henry VIII, {Catherine of Aragon, Anne Boleyn, Jane Seymour, Anne of Cleves, Catherine Howard, Catherine Parr, Elizabeth Blount})
constructed relationships described by constructedrelationships(name, type=basic, list of sets, list of results), construct on demand all of the relationships and sets from the product (choosing one item from each set) one to one mapped to the results. For example:
- constructedrelationships( conjugation, basic, [{Present, Past}, {Singular, Plural}, {First Person, Second Person, Third Person}, {to be, to have}], { put your conjugations here})
- generates relationship(conjugation,Past, to be), {all past tenses of to be})
- relationship( conjugation,(Present,FirstPerson), {all Present First Person conjgations of both to be and to have})
- relations(conjugation, (Present, Singular, First Person, to be), I am)
- etc.
constructed relationships described by constructedrelationships(name, type=regex, list of sets, pattern, replacement) as above constructs relationships, but is uses a regular expression to generalize the result. For example the following 4 regular expressions generate all regular conjugations of English past, present and future tenses ( assuming the 4 parts are concatenated with ";") :
- constructedrelationships( conjugation, regex, [{Present}, {Singular, Plural}, {First Person, Second Person}, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)", "\1 \2")
- constructedrelationships( conjugation, regex, [{Present}, {Singular}, {Third Person}, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)", "\1 \2s")
- constructedrelationships( conjugation, regex, [{Present}, {Plural}, {Third Person}, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)", "\1 \2")
- constructedrelationships( conjugation, regex, [{Past}, {Singular, Plural}, Person, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)", "\1 \2ed")
- constructedrelationships( conjugation, regex, [{Future}, {Singular, Plural}, Person, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)", "\1 will \2")
add cue, target to mark text or media as either a cue, a target, both a cue and a target. Targets are answers and cues are questions. If nothing is marked, assume everything is a cue and a target.
remove comment, pronunciation, example, paraphrase
1. add informational to mark text or media as neither a target nor a cue

Unit Plans

remove hard coded method/game/grammar tag specialwordtype
usemethod defined by usemethod(target name, set of methods) restricts these targets to using only these methods. Otherwise Parley guesses as follows:
- If a target has a one to one, or a one to small number of many, it is suitable for flash cards and multiple choice
- If a target is written it is suitable for anagram and written.
- Any constructed relationship is suitable for the comparison/conjugation tool

Per Unit Plan

activemethod/game(method, isactive) - filters the methods active with this lesson
methodthreshold(method, recognition/production/spelling, low, high) - only use this method if the students recognition/production/spelling score is above low and less than high
useconstruction(method, recognition/production/spelling, low, high) - only use constructions when the student's recognition/production/spelling falls in the range. The idea is that the student will not be asked to conjugate "I" with "to walk", until their recognition is above low percent and then would not be asked to use I walked in a sentence with the preposition "to" and the noun phrase "the store" until all the parts are at low percent.

Student Goals

activelesson(lesson id, isactive) activates/deactivates a lesson and all children
activemethod/game(method, isactive)

Student Assessment

Store data and not statistics. Generate the statistics (Leitner boxes) on demand.

remove grade and replace with
1. Add assessment(recognition, word id, issuccess, timestamp, incorrect word) to track correctly recognized words
2. Add assessment(production, word id, issuccess, timestamp, incorrect word) to track correctly produced words
3. Add assessment(spelling, word id, issuccess, timestamp, incorrect word) to track correctly spelled words

API

Current

Future

Editor

Current

supports 2 or more languages
supports 1 root lesson
supports nested lessons

Future

Currently Unused

These features of the current format appear to be unused

File Format

information.category
identifier.identifiertype - never parsed in kvocdoc
identifier.sizehint - never parsed in kvocdoc
entry.sizehint - never parsed in kvocdoc
leitnerboxes - never used in Parley
deactivated - never parsed in kvocdoc

@@ Line 6: / Line 6: @@
 Each application is divided into File Format, API and Editor requirements.
-For current File Format requirements I listed the tags from the http://edu.kde.org/kvtml/kvtml2.dtd http://edu.kde.org/kvtml/kvtml2.dtd] I knew to be used.
 Editor requirements are to explore the possibility of a common editor widget.
@@ Line 12: / Line 11: @@
 The current requirements are to determine what portions of KVTML2, KEduVocDocument and its associated API are not in use.  Future requirements are each application's wishes for the future.
+=== Kanagram ===
-=== KAnagram ===
 ==== File Format ====
 ===== Current =====
@@ Line 47: / Line 44: @@
 === Parley ===
-==== File Format ====
+==== File Format Requirements ====
-===== Current =====
+===== Current Requirements  of KVTML2 =====
-====== Used Tags ======
+# Header Information (generator, title, author comment)
-Parley uses almost every tag that is parsed from the kvtml2 file so this section is not very useful
+# Two or more languages
-# information.generator
-# information.title
-# information.author
-# information.comment
-# identifiers - 2 or more
-#
-# indentifier.name
-# indentifier.locale
-# identifier.comment
-# indentifier.article
-# indentifier.article.definite
-# indentifier.article.indefinite
-# indentifier.personalpronouns
-# indentifier.personalpronouns.singular
-# indentifier.personalpronouns.dual
-# indentifier.personalpronouns.plural
-# indentifier.personalpronouns.tense
-# firstperson
-# secondperson
-# thirdpersonmale
-# thirdpersonfemale
-# thirdpersonneutralcommon
-#
-# tenses
-# tense
-#
-# lessons
-# wordtypes
-#
-# specialwordtype hardcoded as  (noun|noun/male|noun/female|noun/neutral|verb|adjective|adverb)
-#
-# inpractice
-#
-# entry
-# entries
-#
-# translation
-# text
-# comment
-# pronunciation
-# example
-# paraphrase
-#
-# falsefriend
-# antonym
-# synonym
-# multiplechoice
-#
-# image
-# sound
-#
-# comparison
-# absolute
-# comparative
-# superlative
-#
-# conjugation
-# tense
-# singular
-# dual
-# plural
-# choice
-#
-# grade
-# currentgrade
-# count
-# errorcount
-# date
-#
-# containerentry is used indirectly via inheritence of lessons and wordtype so it is not a requirement
-====== Features ======
-# Header Information
 # Per language identifier information to setup locale, articles (definite and indefinite hardcoded) and pronouns (first, second and third person, single, dual and plural hardcoded)
 # A list of tenses
 # Two nesting containers: word type and lessons
-# A marker for special hardcoded wordtypes
+# A marker for special hardcoded wordtypes, identifying parts of speech tied to methods/games
 # Entries with up to 1 translation per language identifier
 # Each translation can have an image, a sound and several types of text attachments
@@ Line 136: / Line 58: @@
 # Each translation can have a grade consisting of (currentgrade, count, errorcount, date)
+:Currently Parley uses almost every feature and tag provided by KVTML2. Here is a complete list of the [[KDEEdu/Language/KVocDocumentPlanningJuly2014Temporary/ParleyCurrentTags|tags used by]] Parley.
+===== Future Requirements Different from KVTML2 =====
+====== Container ======
+# container format with separate sections for
+## one or more dictionaries of words
+## zero or more collections of word sets and relationships between sets
+## zero or more per user goals (which lessons are active, how are word chosen etc.)
+## one or more unit/lesson plans
+## per user assessment/data/statistics
+# per user/per tool goals and assessment are stored separately
+# Is grammar and word set structure per user or global?
+# Is the unit/lesson plan per user or global?
+# ids for all objects are alphanumeric so that they can be
+## human meaningful, guessable and hackable
+## stable across files if a small number of words/lessons/grades are changed.
+# namespaces for word, wordsets, units, relationships, and constructedRelationships are separate so their names can overlap.
+====== Dictionary of Words ======
+# use a recognized standard (like DICT or XDXF or both) to gain access to millions of words
+# dictionary is per language
+# primarylistseparator - The primary character to use to separate lists of words for this language. Defaults to ','.
+# secondarylistseparator - The secondary character to use to separate lists of words.  It can partition list that include the primary character. Defaults to ';'
+# whitespacechar - White space character. Defaults to ' '. Can have multiples.
+# ignoreextrawhitespace - Is extra white space ignored in this language
+# nullinputchar - Character that represents blank input, in case the language uses '' as a meaningful input.
+# font -
+# locale
+====== Grammar and Word Set Structure ======
+# remove hardcoded grammar falsefriend, antonym, synonym conjugation comparison etc.
+# rename wordtypes to wordsets to imply arbitrary sets of words, not types of speech
+# relationships described by relationship(name, mapping?,  relatee, relations) are the arbitrary relationship (name) of mapping type (one to many, one to one, or many to many) of the relatee(s) to the relation(s). For example:
+#* relationship(translation, dog, {Hund})
+#* relationship(synonyms, ManyToMany {street, road, avenue, boulevard, way}, {street, road, avenue, boulevard, way})
+#* relationship(translation, OneToOne {dog, cat}, {Hund, Katze})
+#* relationship(wives of, Henry VIII, {Catherine of Aragon, Anne Boleyn, Jane Seymour, Anne of Cleves, Catherine Howard, Catherine Parr, Elizabeth Blount})
+# constructed relationships described by constructedrelationships(name, type=basic, list of sets, list of results), construct on demand all of the relationships and sets from the product (choosing one item from each set) one to one mapped to the results.  For example:
+#* constructedrelationships( conjugation, basic, [{Present, Past}, {Singular, Plural}, {First Person, Second Person, Third Person}, {to be, to have}], { put your conjugations here})
+#* generates relationship(conjugation,Past, to be), {all past tenses of to be})
+#* relationship( conjugation,(Present,FirstPerson), {all Present First Person conjgations of both to be and to have})
+#* relations(conjugation, (Present, Singular, First Person, to be), I am)
+#* etc.
+#constructed relationships described by constructedrelationships(name, type=regex, list of sets, pattern, replacement) as above constructs relationships, but is uses a regular expression to generalize the result. For example the following 4 regular expressions generate all regular conjugations of English past, present and future tenses ( assuming the 4 parts are concatenated with ";") :
+#* constructedrelationships( conjugation, regex, [{Present}, {Singular, Plural}, {First Person, Second Person}, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)",  "\1 \2")
+#* constructedrelationships( conjugation, regex, [{Present}, {Singular}, {Third Person}, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)",  "\1 \2s")
+#* constructedrelationships( conjugation, regex, [{Present}, {Plural}, {Third Person}, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)",  "\1 \2")
+#* constructedrelationships( conjugation, regex, [{Past}, {Singular, Plural}, Person, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)",  "\1 \2ed")
+#* constructedrelationships( conjugation, regex, [{Future}, {Singular, Plural}, Person, {to walk, to talk, all other regular English verbs}], ".*;.*;(.*);to (.*)",  "\1 will \2")
+# add cue, target to mark text or media as either a cue, a target, both a cue and a target.  Targets are answers and cues are questions. If nothing is marked, assume everything is a cue and a target.
+# remove comment, pronunciation, example, paraphrase
+## add informational to mark text or media as neither a target nor a cue
+====== Unit Plans ======
+# remove hard coded method/game/grammar tag specialwordtype
+# usemethod defined by usemethod(target name, set of methods) restricts these targets to using only these methods.  Otherwise Parley guesses as follows:
+#* If a target has a one to one, or a one to small number of many,  it is suitable for flash cards and multiple choice
+#* If a target is written it is suitable for anagram and written.
+#* Any constructed relationship is suitable for the comparison/conjugation tool
+====== Per Unit Plan ======
+# activemethod/game(method, isactive) - filters the methods active with this lesson
+# methodthreshold(method, recognition/production/spelling, low, high)  - only use this method if the students recognition/production/spelling score is above low and less than high
+# useconstruction(method, recognition/production/spelling, low, high) - only use constructions when the student's recognition/production/spelling falls in the range.  The idea is that the student will not be asked to conjugate "I" with "to walk", until their recognition is above low percent and then would not be asked to use I walked in a sentence with the preposition "to" and the noun phrase "the store" until all the parts are at low percent.
+====== Student Goals ======
+# activelesson(lesson id, isactive) activates/deactivates a lesson and all children
+# activemethod/game(method, isactive)
+====== Student Assessment ======
+Store data and not statistics.  Generate the statistics (Leitner boxes) on demand.
+# remove grade and replace with
+## Add assessment(recognition, word id, issuccess, timestamp, incorrect word) to track correctly recognized words
+## Add assessment(production, word id, issuccess, timestamp, incorrect word) to track correctly produced words
+## Add assessment(spelling, word id, issuccess, timestamp, incorrect word) to track correctly spelled words
-===== Future =====
-# ids are alphanumeric so they can be a) human meaningful b) stable if words/lessons/grades are in different locations.
-#
 ==== API ====
 ===== Current =====
@@ Line 170: / Line 158: @@
 == Editor ==
+== Further Reading ==
+The following projects can be interesting for designing a container file format for language learning files:
+* http://en.wikipedia.org/wiki/DICT
+* http://en.wikipedia.org/wiki/XDXF
+* http://en.wikipedia.org/wiki/Open_Packaging_Conventions

Latest revision as of 16:44, 10 July 2014

Individual Requirements

Kanagram

File Format

Current

Future

API

Current

Future

Editor

Current

Future

Artikulate

File Format Requirements

Requirements

Key Differences to KVTML

Editor

Current

Future

Parley

File Format Requirements

Current Requirements of KVTML2

Future Requirements Different from KVTML2

Container

Dictionary of Words

Grammar and Word Set Structure

Unit Plans

Per Unit Plan

Student Goals

Student Assessment

API

Current

Future

Editor

Current

Future

Currently Unused

File Format

API

Cumulative Requirements

File Format

API

Editor

Further Reading