KDE Core/ISO Codes: Difference between revisions

From KDE Community Wiki
 
(56 intermediate revisions by the same user not shown)
Line 1: Line 1:
== ISO Codes in KDE ==
== ISO Codes in KDE ==


KDE uses ISO standard codes in a number of places, primarily the Country Code, Language Code and Currency Code in KLocale.  Currently KDE maintains our own data files for these codes and our own translations which imposes a maintenance burden to keep the codes and translations up to date.
KDE uses ISO standard codes in a number of places, primarily the Country Code, Language Code and Currency Code in KLocale.  KGeography also uses Country Subdivision Names but without the ISO Codes.  Currently KDE maintains our own data files for these codes and our own name translations which imposes a maintenance burden to keep the codes and translations up to date.


The [http://pkg-isocodes.alioth.debian.org/ Debian iso-codes project] maintains a package that includes xml files of various ISO Codes and translations for them in po files.  This project is well maintained and regularly updated and is used by many projects and distro's for this.  It would make sense to adopt iso-codes as the source for our codes and translations.
A number of other projects also maintain ISO codes and translations, formost being the iso-codes project.  It might make sense to adopt or join another project as the source for our codes and translations. This page investigates that possibility.


TODO: check if part of MeeGo architecture
Another alternative to investigate is the Unicode CLDR / ICU which includes some ISO codes, particularly Currency Codes.  See http://www.unicode.org/reports/tr35/.


The iso-codes package is 1.1Mb for the data files and 10.3MB for the translation files, however these are often installed anywayThis compares to KDE requiring approx 2MB for Country and Currency data and translations.
If the problems involved in migrating to another project cannot be resolved, then we could migrate to our own xml file format as a new kdesupport or freedesktop.org project which could attract external use and help in maintenance. In fact, we could then extend the library to include flags to solve another issue we have with duplicated dataSuch a library would have to be able to be split into each entity type and separate translation packs to make packaging lighter.


If the problems involved in migrating to iso-codes cannot be resolved, then we could migrate to our own xml file format as a new kdesupport project which could attract external use and help in maintenance.
One advantage to switching our codes from kde-runtime to a separate library is that we could change the underlying library used for each platform, i.e. if a particular platform already supplies the ISO codes in their own package or library we could use that instead of iso-codes.


== Translation Problems ==
== Translation Problems ==


We cannot switch until we are sure that translations will not regress.  We need to ensure all shipping or near-shipping KDE languages are fully supported by iso-codes to our high standards.  This will likely require the KDE translators to donate translations to iso-codes where necessary and possibly agree to maintain the translations where iso-codes does not currently have a team.
We cannot switch to another source unless we are sure that translations will not regress.  We need to ensure all shipping or near-shipping KDE languages are fully supported by the replacements to our high standards.  This may require KDE to move a lot of translations over to the chosen project, or even to set up new translation teams on that project.  This may require more work in a short time period then we may save in the long term and may lower the quality level and so may not be worth it.


== ISO 3166 Country Code ==
ODS Spreadsheet of translation stats:
[[Media:ISO_Codes_Translations.tar.gz]]
 
== iso-codes Project ==
 
The [http://pkg-isocodes.alioth.debian.org/ Debian iso-codes project] maintains a package that includes xml files of various ISO Codes and translations for them in po files.  This project is well maintained and regularly updated and is used by many projects and distro's.
 
TODO: find if official part of MeeGo architecture, is in meego core repository
 
The iso-codes project provides support for the following ISO standards:
* ISO 639 - Language Codes for ISO 639-1 and 639-2
* ISO 639-3 - Languages Codes
* ISO 3166 - Country Codes for ISO 3166-1 and 3166-3
* ISO 3166-2 - Country Subdivision Codes
* ISO 4217 - Currency Codes
* ISO 15924 - Script Codes
 
 
The iso-codes package is 1.1Mb for the data files and 10.3MB for the translation files, however these are often installed anyway.  This compares to KDE requiring approx 2.5MB for Country, Currency and Language data and translations, albeit for a far smaller set of codes, strings and languages.
 
iso-codes ships a a varying number of languages for each different ISO Code each of which is translated to varying degrees.  In total they ship 96 languages consisting of 62 languages actively maintained in the Translation Project (TP), 8 languages actively maintained outside TP, and 26 unmaintained languages.  Of these, 24 are not supported by KDE, 15 of which are also unmaintained by iso-codes, although it appears many of these were copied from KDE3.
 
There are 17 KDE languages that are not translated by iso-codes, only 6 of which KDE have never shipped.
 
Full details are provided below and in the translations spreadsheet attached.
 
No paperwork is required to submit translations to iso-codes on TP, and a number of KDE translators are already active there.  It is not know where the external translations are hosted or what rules may apply.
 
== ISO 3166-1 Country Code ==
 
The ISO 3166 standard defines Country Codes:
* http://www.iso.org/iso/country_codes.htm
 
 
The ISO 3166-2 Country Subdivision Codes are addressed separately below.
 
=== KDE Support ===
 
* http://quickgit.kde.org/?p=kde-runtime.git&a=tree&f=l10n
* http://api.kde.org/4.x-api/kdelibs-apidocs/kdecore/html/classKLocale.html
 
 
KDE derives the ISO Country Codes and Country Names from the KDE Locale (l10n) settings files.  As such if KDE doesn't support a given country locale then we don't know the country code or have a translation for it.  This is fine for our Localization module but renders the support incomplete for any other purposes.  The code are accessed via api in KLocale.
 
=== iso-codes Support ===
 
The iso-codes ISO 3166 file contains both [http://www.iso.org/iso/country_codes/background_on_iso_3166.htm ISO 3166-1 Country Codes] and [http://www.iso.org/iso/country_codes/background_on_iso_3166/iso_3166-3.htm ISO 3166-3 Code for formerly used names of countries].
 
* http://git.debian.org/?p=iso-codes/iso-codes.git;a=blob;f=iso_3166/iso_3166.xml


The [http://git.debian.org/?p=iso-codes/iso-codes.git;a=blob;f=iso_3166/iso_3166.xml iso-codes ISO 3166 xml file] contains two sections of the [http://www.iso.org/iso/country_codes.htm ISO Country Code standard]:
* [http://www.iso.org/iso/country_codes/background_on_iso_3166.htm ISO 3166-1 Country Codes]
* [http://www.iso.org/iso/country_codes/background_on_iso_3166/iso_3166-3.htm ISO 3166-3: Code for formerly used names of countries]


The iso-codes xml format provides for the three different code types (alpha2, alpha3 and numeric) and both official and unofficial/common names of a country.
The iso-codes xml format provides for the three different code types (alpha2, alpha3 and numeric) and both official and unofficial/common names of a country.
Line 65: Line 110:
=== Translation Status ===
=== Translation Status ===


The base xml file is in standard US English.
It's little hard directly comparing translations stats as iso-codes include official, unofficial and former names, whereas KDE only has unofficial names which are mixed into 1 file with other translations.  It is instead assumed a 100% rate for all shipped or recently shipped KDE languages.


http://translationproject.org/domain/iso_3166.html
The base iso-codes xml file is in standard US English.


Version 3.25 of iso-codes ships with 89 translations files for ISO 3166, 61 languages are translated via The Translation Project, 8 are externally hosted and 20 are unmaintained.
http://translationproject.org/domain/iso_3166.html


At least 50% of the languages are 97% to 100% complete, with a further 15% at least 80% complete.
Version 3.25 of iso-codes ships with 96 translations files for ISO 3166, 62 languages are translated via The Translation Project, 8 are externally hosted and 26 are apparently unmaintained.  At least 72% of the languages are 90% to 100% complete, with a further 7% at least 75% complete.


It's little hard directly comparing translations stats as iso-codes include official, unofficial and former names, whereas KDE only has unofficial names which are mixed into 1 file with other translations.
* KDE 4.6 shipped 53 languages, 5 of which are not shipped by iso-codes 3166
 
* Previous KDE4 versions shipped 17 other languages, 6 of which are not in iso-codes 3166
* KDE 4.6 shipped 53 languages, 8 of which are not shipped by iso-codes 3166, and 4 of which are 88%-89% the rest being 98%-100%
* KDE has 19 other languages that have not yet shipped, 6 of which are not in iso-codes 3166
* Previous KDE4 versions shipped 17 other languages, 9 of which are not in iso-codes 3166
* In total KDE4 has 89 languages, 29 of which are not in iso-codes 3166
* KDE has 19 other languages that have not yet shipped, 12 of which are not in iso-codes 3166
* iso-codes has 24 languages that KDE does not have
* In total KDE4 has 89 languages, 29 of which are not in iso-codes 3166, and only 4 of which are less than 75% so all could possibly be useful to iso-codes
* iso-codes has 9 languages that KDE does not have
 
Interestingly, many of the unmaintained translations appear to have been copied from KDE3.


=== iso-codes Change Required ===
=== iso-codes Change Required ===
Line 89: Line 130:
=== KDE Changes Required ===
=== KDE Changes Required ===


* Add KLocale::countryCodes() that returns a QList<QString> of all Country Codes loaded from the iso-codes xml file.  Returns correct uppercase format.
* Add KLocale::countryCodes() that returns a QList<QString> of all Country Codes loaded from the xml file.  Returns correct uppercase format.
* Add KLocale::countryName() taking a country code, name type (official/unofficial name) to return, and a language code to translate into.  Default values to return current locale country name in informal format for current language. Loads name translations from the iso-codes .po files
* Add KLocale::countryName() taking a country code, name type (official/unofficial name) to return.  Default values to return current locale country name in informal format for current language.
* Add KLocale::countryNames() that returns a QList<QPair<QString,QString>> of all Country Codes and their Names in requested format and langauge.
* Add KLocale::countryNames() that returns a QHash<QString,QString> of all Country Codes and their Names in requested format.
* Modify KLocale::allCountriesList() to call countryCodes() and return as lowercase.  Add C value.  Mark as deprecated.
* Add KLocale::localeCountryCodes() to call allCountriesList()
* Modify KLocale::countryCodeToName() to countryName().  Mark as deprecated.
* Mark KLocale::allCountriesList() as deprecated.
* Modify kde-runtime/l10n/ *.desktop files to remove the Name field and their translations, probably rename from .desktop to .locale or similar if doesn't break some implied API guarantee.
* Modify KLocale::countryCodeToName() to call countryName().  Mark as deprecated.
* Modify kde-runtime/l10n/ *.desktop files to remove the Name field and their translations.  Possibly rename from .desktop to .locale or similar if doesn't break some implied API guarantee. Possibly move to different repo and install location with better name?
 
 
Alternatively add a KCountryCode class similar to KCurrencyCode to embed all the details, could use for the level 2 names as well?


=== Country Code Format Conversion ===
=== Country Code Format Conversion ===
Line 100: Line 145:
A number of apps in KDE may need to convert between the different code formats, i.e. EXIV2 stores the country code using the Alpha3 code.  As the iso-codes file provides all the code formats we can provide conversion tools.  We can either add an extra parm to all the country code api calls to allow any code format to be used, but I think this would just confuse issues.  We should stick with a single format as standard, and just provide a single api call to convert the codes.
A number of apps in KDE may need to convert between the different code formats, i.e. EXIV2 stores the country code using the Alpha3 code.  As the iso-codes file provides all the code formats we can provide conversion tools.  We can either add an extra parm to all the country code api calls to allow any code format to be used, but I think this would just confuse issues.  We should stick with a single format as standard, and just provide a single api call to convert the codes.


== ISO 3166-2 Country Subdivision Code ==
== ISO 4217 Currency Codes ==
 
The ISO 4217 defines Currency Codes:
* http://en.wikipedia.org/wiki/ISO_4217
* http://www.currency-iso.org/
 
=== KDE Support ===
 
* http://quickgit.kde.org/?p=kde-runtime.git&a=tree&f=localization/currency
* http://api.kde.org/4.x-api/kdelibs-apidocs/kdecore/html/classKCurrencyCode.html
* http://techbase.kde.org/Projects/kdelibs/localisation#ISO_4217_Currency_Code_Support
 
KDE supports the following fields for all currencies except a few obsolete currencies.
 
CurrencyCodeIsoAlpha3          = NZD
CurrencyCodeIsoNumeric3        = 554
Name                          = New Zealand Dollar
CurrencyNameIso                = New Zealand Dollar
CurrencyUnitSymbols            = $,NZ$,NZD
CurrencyUnitSymbolDefault      = $
CurrencyUnitSymbolUnambiguous  = NZ$
CurrencyUnitSingular          = dollar
CurrencyUnitPlural            = dollars
CurrencySubunitSymbol          = c
CurrencySubunitSingular        = cent
CurrencySubunitPlural          = cents
CurrencyIntroducedDate        = 1967-07-10
CurrencySuspendedDate          =
CurrencyWithdrawnDate          =
CurrencySubunits              = 1
CurrencySubunitsInCirculation  = true
CurrencySubunitsPerUnit        = 100
CurrencyDecimalPlacesDisplay  = 2
CurrencyCountriesInUse        = NZ,CK,NU,PN,TK
 
The only translated field is Name.


The iso-codes file for ISO 3166-2 contains one section of the [http://www.iso.org/iso/country_codes.htm ISO Country Code standard]:
The Unit and Subunit name fields are not currently used by the API as we hit issues around translations and plural forms when mixing units and subunits. I'm exploring an alternative plan.
* [http://www.iso.org/iso/country_codes/background_on_iso_3166/iso_3166-2.htm ISO 3166-2 Country subdivision code]


== ISO xxx Language Codes ==
=== iso-codes Support ===


== ISO 4217 Currency Codes ==
KDE provides far more details and translations than iso-codes, to adopt iso-codes would require iso-codes to make considerable changes.


The [http://git.debian.org/?p=iso-codes/iso-codes.git;a=blob;f=iso_4217/iso_4217.xml iso-codes ISO 4217 xml file] contains the [http://www.currency-iso.org/ ISO Currency Code standard].
The iso-codes xml format provides for the Alpha3 and Numeric Code and the official ISO Currency Name. Note that this ISO Name is inconsistent, it may not include the country name and may even not be in English.


The iso-codes xml format provides for
* http://git.debian.org/?p=iso-codes/iso-codes.git;a=blob;f=iso_4217/iso_4217.xml


The iso-codes project xml format is as follows:
The iso-codes project xml format is as follows:
Line 138: Line 217:
                 numeric_code="554"
                 numeric_code="554"
                 currency_name="New Zealand Dollar" />
                 currency_name="New Zealand Dollar" />
        <iso_4217_entry
                letter_code="ALL"
                numeric_code="008"
                currency_name="Lek" />
        <iso_4217_entry
                letter_code="UYU"
                numeric_code="858"
                currency_name="Peso Uruguayo" />


         <historic_iso_4217_entry
         <historic_iso_4217_entry
Line 147: Line 236:
=== Translation Status ===
=== Translation Status ===


The base xml file is in standard US English.
It's little hard directly comparing translations stats as iso-codes includes the official ISO name, whereas KDE only has our own adjectival form names which are mixed into 1 file with other translations.  It is instead assumed a 100% rate for all shipped or recently shipped KDE languages.


http://translationproject.org/domain/iso_4217.html
The base iso-codes xml file is mostly in standard US English, but some names are in the local language.


Version 3.25 of iso-codes ships with 39 translations files for ISO 4217, 35 languages are translated via The Translation Project and 4 are externally hosted.
http://translationproject.org/domain/iso_4217.html


At least x% of the languages are 90% to 100% complete, with a further x% at least 80% complete.
Version 3.25 of iso-codes ships with 42 translations files for ISO 4217, 33 languages are translated via The Translation Project, 7 are externally hosted and 2 are apparently unmaintained.  At least 52% of the languages are 90% to 100% complete, with a further 2% at least 75% complete.


It's little hard directly comparing translations stats as iso-codes uses official names, whereas KDE uses adjectival form names which are mixed into 1 file with other translations.
* KDE 4.6 shipped 53 languages, 18 of which are not shipped by iso-codes 4217
* Previous KDE4 versions shipped 17 other languages, 1 of which are not in iso-codes 4217
* KDE has 19 other languages that have not yet shipped, 16 of which are not in iso-codes 4217
* In total KDE4 has 89 languages, 35 of which are not in iso-codes 4217
* iso-codes has 3 languages that KDE does not have


* KDE 4.6 shipped 53 languages, x of which are not shipped by iso-codes 4217, and x of which are 80%-89% the rest being 90%-100%
=== iso-codes Change Required ===
* Previous KDE4 versions shipped 17 other languages, x of which are not in iso-codes 4217
 
* KDE has 19 other languages that have not yet shipped, x of which are not in iso-codes 4217
* Add all KDE required fields
* In total KDE4 has 89 languages, x of which are not in iso-codes 4217, and only x of which are less than 75% so all could possibly be useful to iso-codes
* Add all KDE required languages and translations
* iso-codes 4217 has x languages that KDE does not have
 
=== KDE Changes Required ===
 
* Change KCurrencyCode to load all details from xml and translations from po
* Delete .desktop files
 
== ISO 639 Language Codes ==
 
ISO 639 defines Language Codes using a number of variants:
* http://en.wikipedia.org/wiki/ISO_639
* http://en.wikipedia.org/wiki/ISO_639-1
* http://en.wikipedia.org/wiki/ISO_639-2
* http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=22109
 
=== KDE Support ===
 
* http://quickgit.kde.org/?p=kdelibs.git&a=blob&f=kdecore/localization/all_languages.desktop
* http://api.kde.org/4.x-api/kdelibs-apidocs/kdecore/html/classKLocale.html
 
KDE has translations for 188 language codes and locale variations (e.g. en_GB, sr@latin).  There is not an exact match between the locale language codes and the ISO codes which may make using ISO codes unsuitable for KLocale purposes.
 
=== iso-codes Support ===
 
The iso-codes ISO 639 file contains the ISO 639-1 Alpha2 and 639-2 Alpha3 Language Code standards.
 
* http://git.debian.org/?p=iso-codes/iso-codes.git;a=blob;f=iso_639/iso_639.xml
 
The iso-codes xml format provides for the three different code types (alpha2, alpha3-B and alpha3-T) and a list of name variations.
 
The iso-codes project xml format is as follows:
 
<!DOCTYPE iso_639_entries [
        <!ELEMENT iso_639_entries (iso_639_entry+)>
        <!ELEMENT iso_639_entry EMPTY>
        <!ATTLIST iso_639_entry
                iso_639_2B_code        CDATA  #REQUIRED
                iso_639_2T_code        CDATA  #REQUIRED
                iso_639_1_code          CDATA  #IMPLIED
                name                    CDATA  #REQUIRED
        >
]>
 
Some example entries are:
 
        <iso_639_entry
                iso_639_2B_code="ara"
                iso_639_2T_code="ara"
                iso_639_1_code="ar"
                name="Arabic" />
 
        <iso_639_entry
                iso_639_2B_code="ger"
                iso_639_2T_code="deu"
                iso_639_1_code="de"
                name="German" />
 
=== Translation Status ===
 
The base iso-codes xml file is in standard US English.
 
http://translationproject.org/domain/iso_3166.html
 
Version 3.25 of iso-codes ships with xx translations files for ISO 639, xx languages are translated via The Translation Project, xx are externally hosted and xx are apparently unmaintained.  At least xx% of the languages are 90% to 100% complete, with a further xx% at least 75% complete.
 
* KDE 4.6 shipped 53 languages, xx of which are not shipped by iso-codes 639
* Previous KDE4 versions shipped 17 other languages, xx of which are not in iso-codes 639
* KDE has 19 other languages that have not yet shipped, xx of which are not in iso-codes 639
* In total KDE4 has 89 languages, xx of which are not in iso-codes 639
* iso-codes has xx languages that KDE does not have


=== iso-codes Change Required ===
=== iso-codes Change Required ===
Line 167: Line 328:


=== KDE Changes Required ===
=== KDE Changes Required ===
== ISO 3166-2 Country Subdivision Code ==
The iso-codes file for ISO 3166-2 contains one section of the [http://www.iso.org/iso/country_codes.htm ISO Country Code standard]:
* [http://www.iso.org/iso/country_codes/background_on_iso_3166/iso_3166-2.htm ISO 3166-2 Country subdivision code]
=== KDE Support ===
KDE does not currently support the Level 2 codes, but I want to use these in KLocale and KHolidays, with Plasma, Marble and KGeography also potential users.
It may be possible to use iso-codes for this without requiring translations as the names are primarially in the latinised native language, i.e. regions in Spain have names in Spanish (Catalunya not Catalonia) and as the vast majority of use cases for the code involve a native choosing their home region this may be acceptable for the majority of languages (although this will be a political issue in some regions).  However if we start requiring iso-codes in kdelibs for this we may as well switch for all the others.
An alternative documented below is to use the KGeography translations which are fairly complete.
=== iso-codes Support ===
* http://git.debian.org/?p=iso-codes/iso-codes.git;a=blob;f=iso_3166_2/iso_3166_2.xml
<!DOCTYPE iso_3166_2_entries [
        <!ELEMENT iso_3166_2_entries (iso_3166_country+)>
  <!ELEMENT iso_3166_country (iso_3166_subset*)>
        <!ATTLIST iso_3166_country
                code                    CDATA  #REQUIRED
        >
        <!ELEMENT iso_3166_subset (iso_3166_2_entry+)>
        <!ATTLIST iso_3166_subset
                type                    CDATA  #REQUIRED
        >
        <!ELEMENT iso_3166_2_entry EMPTY>
        <!ATTLIST iso_3166_2_entry
                code                    CDATA  #REQUIRED
                name                    CDATA  #REQUIRED
                parent                  CDATA  #IMPLIED
        >
]>
    <-- Nepal -->
<iso_3166_country code="NP">
<iso_3166_subset type="Development region">
        <iso_3166_2_entry
                code="NP-1"    name="Madhyamanchal" />
        <iso_3166_2_entry
                code="NP-2"    name="Madhya Pashchimanchal" />
        <iso_3166_2_entry
                code="NP-3"    name="Pashchimanchal" />
        <iso_3166_2_entry
                code="NP-4"    name="Purwanchal" />
        <iso_3166_2_entry
                code="NP-5"    name="Sudur Pashchimanchal" />
</iso_3166_subset>
<iso_3166_subset type="zone">
        <iso_3166_2_entry
                code="NP-BA"    name="Bagmati"  parent="1" />
        <iso_3166_2_entry
                code="NP-BH"    name="Bheri"    parent="2" />
        <iso_3166_2_entry
                code="NP-DH"    name="Dhawalagiri"      parent="3" />
        <iso_3166_2_entry
                code="NP-GA"    name="Gandaki"  parent="3" />
        <iso_3166_2_entry
                code="NP-JA"    name="Janakpur" parent="1" />
        <iso_3166_2_entry
                code="NP-KA"    name="Karnali"  parent="2" />
        <iso_3166_2_entry
                code="NP-KO"    name="Kosi"    parent="4" />
        <iso_3166_2_entry
                code="NP-LU"    name="Lumbini"  parent="3" />
        <iso_3166_2_entry
                code="NP-MA"    name="Mahakali" parent="5" />
        <iso_3166_2_entry
                code="NP-ME"    name="Mechi"    parent="4" />
        <iso_3166_2_entry
                code="NP-NA"    name="Narayani" parent="1" />
        <iso_3166_2_entry
                code="NP-RA"    name="Rapti"    parent="2" />
        <iso_3166_2_entry
                code="NP-SA"    name="Sagarmatha"      parent="4" />
        <iso_3166_2_entry
                code="NP-SE"    name="Seti"    parent="5" />
</iso_3166_subset>
</iso_3166_country>
See below for KDE comparison
=== Translation Status ===
* http://translationproject.org/domain/iso_3166_2.html
iso-codes only ships 40 languages, 17% are 90%-100% complete, 25% are 10%-89% complete and 57% are 0%-9% complete.
=== KGeography ===
* http://edu.kde.org/applications/all/kgeography
* http://quickgit.kde.org/?p=kgeography.git&a=blob&f=src/mapsdatatranslation.cpp
KGeography has 6825 strings, mostly Subdivision names but also Countries, Capitals, Oceans, and other geographical entities.  These are currently translated into 64 languages, 56% are 80-100% translated.  All shipped KDE languages are at or near 100%.  Each complete .po varies between 0.75MB and 1.25MB.
It is probable that there is a very large overlap with the 4551 strings in ISO 3166-2 and could be used to boost the iso-codes project or our project.
Example for Nepal, compare to iso-codes version above:
i18nc("nepal_zones.kgm", "Nepal (Zones)");
i18nc("nepal_zones.kgm", "Zones");
i18nc("nepal_zones.kgm", "Frontier");
i18nc("nepal_zones.kgm", "Water");
i18nc("nepal_zones.kgm", "Not Nepal (Zones)");
i18nc("nepal_zones.kgm", "Bagmati");
i18nc("nepal_zones.kgm", "Bheri");
i18nc("nepal_zones.kgm", "Dhawalagiri");
i18nc("nepal_zones.kgm", "Gandaki");
i18nc("nepal_zones.kgm", "Janakpur");
i18nc("nepal_zones.kgm", "Karnali");
i18nc("nepal_zones.kgm", "Koshi");
i18nc("nepal_zones.kgm", "Lumbini");
i18nc("nepal_zones.kgm", "Mahakali");
i18nc("nepal_zones.kgm", "Mechi");
i18nc("nepal_zones.kgm", "Narayani");
i18nc("nepal_zones.kgm", "Rapti");
i18nc("nepal_zones.kgm", "Sagarmatha");
i18nc("nepal_zones.kgm", "Seti");
An interesting future prospect for KGeography is the dynamic generation of maps using the ISO Codes and region types combined with a new Natural Earth vector layer in Marble providing vector borders.
== Proposed Plan ==
My conclusion is that iso-codes has no-where near the level or translations required for KDE to adopt it in the near future, and getting it ready will take a lot of work.  A more sensible approach seems to be to migrate internally inside kde-runtime to xml format with po files for translations instead of our current .desktop file based implementation.  This should actually reduce the installed footprint as the .desktop files have all the name translations embedded in them.  Once we have a clean set of code files and translations we can then decide whether to stay as is, propose to merge with iso-codes or launch our own project.
Big question: is there an implicit api guarantee for the installed location and format of the locale and flag files?
* Convert kde-runtime/localization/currency to an xml file format with .po files. This can be scripted from the existing .desktop files.
* Create kde-runtime/localization/country to support ISO 3166 with an xml format and po files derived from the existing kde-runtime/l10n desktop files.
* Create kde-runtime/localization/flags/icon using the flags from kde-runtime/l10n
* Create kde-runtime/localization/flags/svg for base svg format flags to use to generate other flag sets, don't install.
* Modify kde-runtime/l10n to remove names and translations, possibly rename to .locale files and move to kde-runtime/localization/locale
* Create kde-runtime/localization/languages to support ISO 639, merge iso-codes with all-languages.desktop.  This can be scripted.
* Create kde-runtime/localization/subdivision (or add in ./country) to support ISO 3166-2.  Merge the iso-codes xml and po files with the KGeography translations to create a new xml file and po files.  Can be largely scripted but will need manually checking.
=== Country Codes ===
See [[/Country_Code|Country Code]].

Latest revision as of 13:24, 25 February 2014

ISO Codes in KDE

KDE uses ISO standard codes in a number of places, primarily the Country Code, Language Code and Currency Code in KLocale. KGeography also uses Country Subdivision Names but without the ISO Codes. Currently KDE maintains our own data files for these codes and our own name translations which imposes a maintenance burden to keep the codes and translations up to date.

A number of other projects also maintain ISO codes and translations, formost being the iso-codes project. It might make sense to adopt or join another project as the source for our codes and translations. This page investigates that possibility.

Another alternative to investigate is the Unicode CLDR / ICU which includes some ISO codes, particularly Currency Codes. See http://www.unicode.org/reports/tr35/.

If the problems involved in migrating to another project cannot be resolved, then we could migrate to our own xml file format as a new kdesupport or freedesktop.org project which could attract external use and help in maintenance. In fact, we could then extend the library to include flags to solve another issue we have with duplicated data. Such a library would have to be able to be split into each entity type and separate translation packs to make packaging lighter.

One advantage to switching our codes from kde-runtime to a separate library is that we could change the underlying library used for each platform, i.e. if a particular platform already supplies the ISO codes in their own package or library we could use that instead of iso-codes.

Translation Problems

We cannot switch to another source unless we are sure that translations will not regress. We need to ensure all shipping or near-shipping KDE languages are fully supported by the replacements to our high standards. This may require KDE to move a lot of translations over to the chosen project, or even to set up new translation teams on that project. This may require more work in a short time period then we may save in the long term and may lower the quality level and so may not be worth it.

ODS Spreadsheet of translation stats: Media:ISO_Codes_Translations.tar.gz

iso-codes Project

The Debian iso-codes project maintains a package that includes xml files of various ISO Codes and translations for them in po files. This project is well maintained and regularly updated and is used by many projects and distro's.

TODO: find if official part of MeeGo architecture, is in meego core repository

The iso-codes project provides support for the following ISO standards:

  • ISO 639 - Language Codes for ISO 639-1 and 639-2
  • ISO 639-3 - Languages Codes
  • ISO 3166 - Country Codes for ISO 3166-1 and 3166-3
  • ISO 3166-2 - Country Subdivision Codes
  • ISO 4217 - Currency Codes
  • ISO 15924 - Script Codes


The iso-codes package is 1.1Mb for the data files and 10.3MB for the translation files, however these are often installed anyway. This compares to KDE requiring approx 2.5MB for Country, Currency and Language data and translations, albeit for a far smaller set of codes, strings and languages.

iso-codes ships a a varying number of languages for each different ISO Code each of which is translated to varying degrees. In total they ship 96 languages consisting of 62 languages actively maintained in the Translation Project (TP), 8 languages actively maintained outside TP, and 26 unmaintained languages. Of these, 24 are not supported by KDE, 15 of which are also unmaintained by iso-codes, although it appears many of these were copied from KDE3.

There are 17 KDE languages that are not translated by iso-codes, only 6 of which KDE have never shipped.

Full details are provided below and in the translations spreadsheet attached.

No paperwork is required to submit translations to iso-codes on TP, and a number of KDE translators are already active there. It is not know where the external translations are hosted or what rules may apply.

ISO 3166-1 Country Code

The ISO 3166 standard defines Country Codes:


The ISO 3166-2 Country Subdivision Codes are addressed separately below.

KDE Support


KDE derives the ISO Country Codes and Country Names from the KDE Locale (l10n) settings files. As such if KDE doesn't support a given country locale then we don't know the country code or have a translation for it. This is fine for our Localization module but renders the support incomplete for any other purposes. The code are accessed via api in KLocale.

iso-codes Support

The iso-codes ISO 3166 file contains both ISO 3166-1 Country Codes and ISO 3166-3 Code for formerly used names of countries.


The iso-codes xml format provides for the three different code types (alpha2, alpha3 and numeric) and both official and unofficial/common names of a country.

The iso-codes project xml format is as follows:

<!DOCTYPE iso_3166_entries [
        <!ELEMENT iso_3166_entries (iso_3166_entry+, iso_3166_3_entry*)>
        <!ELEMENT iso_3166_entry EMPTY>
        <!ATTLIST iso_3166_entry
                alpha_2_code            CDATA   #REQUIRED
                alpha_3_code            CDATA   #REQUIRED
                numeric_code            CDATA   #REQUIRED
                common_name             CDATA   #IMPLIED
                name                    CDATA   #REQUIRED
                official_name           CDATA   #IMPLIED
        >
        <!ELEMENT iso_3166_3_entry EMPTY>
        <!ATTLIST iso_3166_3_entry
                alpha_4_code            CDATA   #REQUIRED
                alpha_3_code            CDATA   #REQUIRED
                numeric_code            CDATA   #IMPLIED
                date_withdrawn          CDATA   #IMPLIED
                names                   CDATA   #REQUIRED
                comment                 CDATA   #IMPLIED
        >
]>

Some example entries are:

        <iso_3166_entry
                alpha_2_code="AF"
                alpha_3_code="AFG"
                numeric_code="004"
                name="Afghanistan"
                official_name="Islamic Republic of Afghanistan" />
        <iso_3166_3_entry
                alpha_4_code="YUCS"
                alpha_3_code="YUG"
                numeric_code="891"
                date_withdrawn="1993-07-28"
                names="Yugoslavia, Socialist Federal Republic of" />

Translation Status

It's little hard directly comparing translations stats as iso-codes include official, unofficial and former names, whereas KDE only has unofficial names which are mixed into 1 file with other translations. It is instead assumed a 100% rate for all shipped or recently shipped KDE languages.

The base iso-codes xml file is in standard US English.

http://translationproject.org/domain/iso_3166.html

Version 3.25 of iso-codes ships with 96 translations files for ISO 3166, 62 languages are translated via The Translation Project, 8 are externally hosted and 26 are apparently unmaintained. At least 72% of the languages are 90% to 100% complete, with a further 7% at least 75% complete.

  • KDE 4.6 shipped 53 languages, 5 of which are not shipped by iso-codes 3166
  • Previous KDE4 versions shipped 17 other languages, 6 of which are not in iso-codes 3166
  • KDE has 19 other languages that have not yet shipped, 6 of which are not in iso-codes 3166
  • In total KDE4 has 89 languages, 29 of which are not in iso-codes 3166
  • iso-codes has 24 languages that KDE does not have

iso-codes Change Required

May need to review unofficial names to see if close enough match to ours.

KDE Changes Required

  • Add KLocale::countryCodes() that returns a QList<QString> of all Country Codes loaded from the xml file. Returns correct uppercase format.
  • Add KLocale::countryName() taking a country code, name type (official/unofficial name) to return. Default values to return current locale country name in informal format for current language.
  • Add KLocale::countryNames() that returns a QHash<QString,QString> of all Country Codes and their Names in requested format.
  • Add KLocale::localeCountryCodes() to call allCountriesList()
  • Mark KLocale::allCountriesList() as deprecated.
  • Modify KLocale::countryCodeToName() to call countryName(). Mark as deprecated.
  • Modify kde-runtime/l10n/ *.desktop files to remove the Name field and their translations. Possibly rename from .desktop to .locale or similar if doesn't break some implied API guarantee. Possibly move to different repo and install location with better name?


Alternatively add a KCountryCode class similar to KCurrencyCode to embed all the details, could use for the level 2 names as well?

Country Code Format Conversion

A number of apps in KDE may need to convert between the different code formats, i.e. EXIV2 stores the country code using the Alpha3 code. As the iso-codes file provides all the code formats we can provide conversion tools. We can either add an extra parm to all the country code api calls to allow any code format to be used, but I think this would just confuse issues. We should stick with a single format as standard, and just provide a single api call to convert the codes.

ISO 4217 Currency Codes

The ISO 4217 defines Currency Codes:

KDE Support

KDE supports the following fields for all currencies except a few obsolete currencies.

CurrencyCodeIsoAlpha3          = NZD
CurrencyCodeIsoNumeric3        = 554
Name                           = New Zealand Dollar
CurrencyNameIso                = New Zealand Dollar
CurrencyUnitSymbols            = $,NZ$,NZD
CurrencyUnitSymbolDefault      = $
CurrencyUnitSymbolUnambiguous  = NZ$
CurrencyUnitSingular           = dollar
CurrencyUnitPlural             = dollars
CurrencySubunitSymbol          = c
CurrencySubunitSingular        = cent
CurrencySubunitPlural          = cents
CurrencyIntroducedDate         = 1967-07-10
CurrencySuspendedDate          =
CurrencyWithdrawnDate          =
CurrencySubunits               = 1
CurrencySubunitsInCirculation  = true
CurrencySubunitsPerUnit        = 100
CurrencyDecimalPlacesDisplay   = 2
CurrencyCountriesInUse         = NZ,CK,NU,PN,TK

The only translated field is Name.

The Unit and Subunit name fields are not currently used by the API as we hit issues around translations and plural forms when mixing units and subunits. I'm exploring an alternative plan.

iso-codes Support

KDE provides far more details and translations than iso-codes, to adopt iso-codes would require iso-codes to make considerable changes.

The iso-codes xml format provides for the Alpha3 and Numeric Code and the official ISO Currency Name. Note that this ISO Name is inconsistent, it may not include the country name and may even not be in English.

The iso-codes project xml format is as follows:

<!DOCTYPE iso_4217_entries [
        <!ELEMENT iso_4217_entries (iso_4217_entry+, historic_iso_4217_entry*)>
        <!ELEMENT iso_4217_entry EMPTY>
        <!ATTLIST iso_4217_entry
                letter_code             CDATA   #REQUIRED
                numeric_code            CDATA   #IMPLIED
                currency_name           CDATA   #REQUIRED
        >
        <!ELEMENT historic_iso_4217_entry EMPTY>
        <!ATTLIST historic_iso_4217_entry
                letter_code             CDATA   #REQUIRED
                numeric_code            CDATA   #IMPLIED
                currency_name           CDATA   #REQUIRED
                date_withdrawn          CDATA   #REQUIRED
        >
]>

Some example entries are:

        <iso_4217_entry
                letter_code="NZD"
                numeric_code="554"
                currency_name="New Zealand Dollar" />
        <iso_4217_entry
                letter_code="ALL"
                numeric_code="008"
                currency_name="Lek" />
        <iso_4217_entry
                letter_code="UYU"
                numeric_code="858"
                currency_name="Peso Uruguayo" />
        <historic_iso_4217_entry
                letter_code="YUN"
                numeric_code="890"
                currency_name="Yugoslavian Dinar"
                date_withdrawn="1995-11" />

Translation Status

It's little hard directly comparing translations stats as iso-codes includes the official ISO name, whereas KDE only has our own adjectival form names which are mixed into 1 file with other translations. It is instead assumed a 100% rate for all shipped or recently shipped KDE languages.

The base iso-codes xml file is mostly in standard US English, but some names are in the local language.

http://translationproject.org/domain/iso_4217.html

Version 3.25 of iso-codes ships with 42 translations files for ISO 4217, 33 languages are translated via The Translation Project, 7 are externally hosted and 2 are apparently unmaintained. At least 52% of the languages are 90% to 100% complete, with a further 2% at least 75% complete.

  • KDE 4.6 shipped 53 languages, 18 of which are not shipped by iso-codes 4217
  • Previous KDE4 versions shipped 17 other languages, 1 of which are not in iso-codes 4217
  • KDE has 19 other languages that have not yet shipped, 16 of which are not in iso-codes 4217
  • In total KDE4 has 89 languages, 35 of which are not in iso-codes 4217
  • iso-codes has 3 languages that KDE does not have

iso-codes Change Required

  • Add all KDE required fields
  • Add all KDE required languages and translations

KDE Changes Required

  • Change KCurrencyCode to load all details from xml and translations from po
  • Delete .desktop files

ISO 639 Language Codes

ISO 639 defines Language Codes using a number of variants:

KDE Support

KDE has translations for 188 language codes and locale variations (e.g. en_GB, sr@latin). There is not an exact match between the locale language codes and the ISO codes which may make using ISO codes unsuitable for KLocale purposes.

iso-codes Support

The iso-codes ISO 639 file contains the ISO 639-1 Alpha2 and 639-2 Alpha3 Language Code standards.

The iso-codes xml format provides for the three different code types (alpha2, alpha3-B and alpha3-T) and a list of name variations.

The iso-codes project xml format is as follows:

<!DOCTYPE iso_639_entries [
        <!ELEMENT iso_639_entries (iso_639_entry+)>
        <!ELEMENT iso_639_entry EMPTY>
        <!ATTLIST iso_639_entry
                iso_639_2B_code         CDATA   #REQUIRED
                iso_639_2T_code         CDATA   #REQUIRED
                iso_639_1_code          CDATA   #IMPLIED
                name                    CDATA   #REQUIRED
        >
]>

Some example entries are:

        <iso_639_entry
                iso_639_2B_code="ara"
                iso_639_2T_code="ara"
                iso_639_1_code="ar"
                name="Arabic" />
        <iso_639_entry
                iso_639_2B_code="ger"
                iso_639_2T_code="deu"
                iso_639_1_code="de"
                name="German" />

Translation Status

The base iso-codes xml file is in standard US English.

http://translationproject.org/domain/iso_3166.html

Version 3.25 of iso-codes ships with xx translations files for ISO 639, xx languages are translated via The Translation Project, xx are externally hosted and xx are apparently unmaintained. At least xx% of the languages are 90% to 100% complete, with a further xx% at least 75% complete.

  • KDE 4.6 shipped 53 languages, xx of which are not shipped by iso-codes 639
  • Previous KDE4 versions shipped 17 other languages, xx of which are not in iso-codes 639
  • KDE has 19 other languages that have not yet shipped, xx of which are not in iso-codes 639
  • In total KDE4 has 89 languages, xx of which are not in iso-codes 639
  • iso-codes has xx languages that KDE does not have

iso-codes Change Required

KDE Changes Required

ISO 3166-2 Country Subdivision Code

The iso-codes file for ISO 3166-2 contains one section of the ISO Country Code standard:

KDE Support

KDE does not currently support the Level 2 codes, but I want to use these in KLocale and KHolidays, with Plasma, Marble and KGeography also potential users.

It may be possible to use iso-codes for this without requiring translations as the names are primarially in the latinised native language, i.e. regions in Spain have names in Spanish (Catalunya not Catalonia) and as the vast majority of use cases for the code involve a native choosing their home region this may be acceptable for the majority of languages (although this will be a political issue in some regions). However if we start requiring iso-codes in kdelibs for this we may as well switch for all the others.

An alternative documented below is to use the KGeography translations which are fairly complete.

iso-codes Support

<!DOCTYPE iso_3166_2_entries [
        <!ELEMENT iso_3166_2_entries (iso_3166_country+)>
  <!ELEMENT iso_3166_country (iso_3166_subset*)>
        <!ATTLIST iso_3166_country
                code                    CDATA   #REQUIRED
        >
        <!ELEMENT iso_3166_subset (iso_3166_2_entry+)>
        <!ATTLIST iso_3166_subset
                type                    CDATA   #REQUIRED
        >
        <!ELEMENT iso_3166_2_entry EMPTY>
        <!ATTLIST iso_3166_2_entry
                code                    CDATA   #REQUIRED
                name                    CDATA   #REQUIRED
                parent                  CDATA   #IMPLIED
        >
]>


    <-- Nepal -->
<iso_3166_country code="NP">
<iso_3166_subset type="Development region">
        <iso_3166_2_entry
                code="NP-1"     name="Madhyamanchal" />
        <iso_3166_2_entry
                code="NP-2"     name="Madhya Pashchimanchal" />
        <iso_3166_2_entry
                code="NP-3"     name="Pashchimanchal" />
        <iso_3166_2_entry
                code="NP-4"     name="Purwanchal" />
        <iso_3166_2_entry
                code="NP-5"     name="Sudur Pashchimanchal" />
</iso_3166_subset>
<iso_3166_subset type="zone">
        <iso_3166_2_entry
                code="NP-BA"    name="Bagmati"  parent="1" />
        <iso_3166_2_entry
                code="NP-BH"    name="Bheri"    parent="2" />
        <iso_3166_2_entry
                code="NP-DH"    name="Dhawalagiri"      parent="3" />
        <iso_3166_2_entry
                code="NP-GA"    name="Gandaki"  parent="3" />
        <iso_3166_2_entry
                code="NP-JA"    name="Janakpur" parent="1" />
        <iso_3166_2_entry
                code="NP-KA"    name="Karnali"  parent="2" />
        <iso_3166_2_entry
                code="NP-KO"    name="Kosi"     parent="4" />
        <iso_3166_2_entry
                code="NP-LU"    name="Lumbini"  parent="3" />
        <iso_3166_2_entry
                code="NP-MA"    name="Mahakali" parent="5" />
        <iso_3166_2_entry
                code="NP-ME"    name="Mechi"    parent="4" />
        <iso_3166_2_entry
                code="NP-NA"    name="Narayani" parent="1" />
        <iso_3166_2_entry
                code="NP-RA"    name="Rapti"    parent="2" />
        <iso_3166_2_entry
                code="NP-SA"    name="Sagarmatha"       parent="4" />
        <iso_3166_2_entry
                code="NP-SE"    name="Seti"     parent="5" />
</iso_3166_subset>
</iso_3166_country>

See below for KDE comparison

Translation Status

iso-codes only ships 40 languages, 17% are 90%-100% complete, 25% are 10%-89% complete and 57% are 0%-9% complete.

KGeography

KGeography has 6825 strings, mostly Subdivision names but also Countries, Capitals, Oceans, and other geographical entities. These are currently translated into 64 languages, 56% are 80-100% translated. All shipped KDE languages are at or near 100%. Each complete .po varies between 0.75MB and 1.25MB.

It is probable that there is a very large overlap with the 4551 strings in ISO 3166-2 and could be used to boost the iso-codes project or our project.

Example for Nepal, compare to iso-codes version above:

i18nc("nepal_zones.kgm", "Nepal (Zones)");
i18nc("nepal_zones.kgm", "Zones");
i18nc("nepal_zones.kgm", "Frontier");
i18nc("nepal_zones.kgm", "Water");
i18nc("nepal_zones.kgm", "Not Nepal (Zones)");
i18nc("nepal_zones.kgm", "Bagmati");
i18nc("nepal_zones.kgm", "Bheri");
i18nc("nepal_zones.kgm", "Dhawalagiri");
i18nc("nepal_zones.kgm", "Gandaki");
i18nc("nepal_zones.kgm", "Janakpur");
i18nc("nepal_zones.kgm", "Karnali");
i18nc("nepal_zones.kgm", "Koshi");
i18nc("nepal_zones.kgm", "Lumbini");
i18nc("nepal_zones.kgm", "Mahakali");
i18nc("nepal_zones.kgm", "Mechi");
i18nc("nepal_zones.kgm", "Narayani");
i18nc("nepal_zones.kgm", "Rapti");
i18nc("nepal_zones.kgm", "Sagarmatha");
i18nc("nepal_zones.kgm", "Seti");

An interesting future prospect for KGeography is the dynamic generation of maps using the ISO Codes and region types combined with a new Natural Earth vector layer in Marble providing vector borders.

Proposed Plan

My conclusion is that iso-codes has no-where near the level or translations required for KDE to adopt it in the near future, and getting it ready will take a lot of work. A more sensible approach seems to be to migrate internally inside kde-runtime to xml format with po files for translations instead of our current .desktop file based implementation. This should actually reduce the installed footprint as the .desktop files have all the name translations embedded in them. Once we have a clean set of code files and translations we can then decide whether to stay as is, propose to merge with iso-codes or launch our own project.

Big question: is there an implicit api guarantee for the installed location and format of the locale and flag files?

  • Convert kde-runtime/localization/currency to an xml file format with .po files. This can be scripted from the existing .desktop files.
  • Create kde-runtime/localization/country to support ISO 3166 with an xml format and po files derived from the existing kde-runtime/l10n desktop files.
  • Create kde-runtime/localization/flags/icon using the flags from kde-runtime/l10n
  • Create kde-runtime/localization/flags/svg for base svg format flags to use to generate other flag sets, don't install.
  • Modify kde-runtime/l10n to remove names and translations, possibly rename to .locale files and move to kde-runtime/localization/locale
  • Create kde-runtime/localization/languages to support ISO 639, merge iso-codes with all-languages.desktop. This can be scripted.
  • Create kde-runtime/localization/subdivision (or add in ./country) to support ISO 3166-2. Merge the iso-codes xml and po files with the KGeography translations to create a new xml file and po files. Can be largely scripted but will need manually checking.

Country Codes

See Country Code.