KDE Core/KDE Open Data: Difference between revisions

From KDE Community Wiki
 
(4 intermediate revisions by the same user not shown)
Line 18: Line 18:


This separation will allow distros to package the data separately and allow other desktop projects or applications not written using Qt/C++ to utilise the data without having to install the Qt libraries. It is hoped for these data files to become a shared common resource and so reduce duplication of effort and file space.
This separation will allow distros to package the data separately and allow other desktop projects or applications not written using Qt/C++ to utilise the data without having to install the Qt libraries. It is hoped for these data files to become a shared common resource and so reduce duplication of effort and file space.
The json file format will be defined using [http://json-schema.org/ json scheme]


The initial products are proposed as:
The initial products are proposed as:
* OpenCodes - A repository of json format files for each ISO Country, Language, and Currency code entity with important data fields extracted from Wikidata and their translations.
* [[KDE_Core/OpenCodes|OpenCodes]] - A repository of json format files for each ISO Country, Language, and Currency code entity with important data fields extracted from Wikidata and their translations.
* OpenFlags - A repository of national and regional flags extracted from Wikidata / Wiki Commons that are installed as a freedesktop.org standard icon set and a collection of svg files.
* OpenFlags - A repository of national and regional flags extracted from Wikidata / Wiki Commons that are installed as a freedesktop.org standard icon set and a collection of svg files.
* OpenHolidays - A repository of holiday data files, initially generated from the existing KHolidays files.
* OpenHolidays - A repository of holiday data files, initially generated from the existing KHolidays files.
Line 35: Line 37:
ISO codes for Countries, Languages, Scripts and Currencies are extensively used in software, and any application framework needs to support using them. Some use cases are fundamental, such as locale codes and providing name translations for ui lists. Other use cases are converting between different code formats, for example EXIV2 stores the country code using the Alpha3 code, but geolocation tools usually use the Alpha2 code, so photo apps like Digikam need to know how to convert between the two. Other useful fields are things like TLD codes, phone prefixes and sort names.
ISO codes for Countries, Languages, Scripts and Currencies are extensively used in software, and any application framework needs to support using them. Some use cases are fundamental, such as locale codes and providing name translations for ui lists. Other use cases are converting between different code formats, for example EXIV2 stores the country code using the Alpha3 code, but geolocation tools usually use the Alpha2 code, so photo apps like Digikam need to know how to convert between the two. Other useful fields are things like TLD codes, phone prefixes and sort names.


KDE only supports those ISO Country and Language codes used within its locale data files. KDE has advanced currency code support for the complete set of codes in the KCurrencyCode class which was designed to support the needs of personal finance programs (and was largely derived form the Wikipedia data boxes). KLocale and KCurrencyCode are dropped in KF5 so KDE requires a replacement for existing use cases.
KDE only supports those ISO Country and Language codes used within its locale data files. KDE has advanced currency code support for the complete set of codes in the [http://api.kde.org/4.x-api/kdelibs-apidocs/kdecore/html/classKCurrencyCode.html KCurrencyCode class] which was designed to support the needs of personal finance programs (and was largely derived form the Wikipedia data boxes). KLocale and KCurrencyCode are dropped in KF5 so KDE requires a replacement for existing use cases.


Qt uses CLDR as the source for its ISO code support. Qt has support for the ISO codes used within its locale data files, and provides native language names, but not for the full set of codes or name translations or the extra data fields KDE requires. While the full set of codes may be supported, other translations or data fields will not be, so KDE will need to supplement Qt's support.
Qt5 uses CLDR as the source for its ISO code support. Qt has support for the ISO codes used within its locale data files, and provides native language names, but not for the full set of codes or name translations or the extra data fields KDE requires. While the full set of codes may be supported, other translations or data fields will not be, so KDE will need to supplement Qt5's support.


The isocodes project was one suggested replacement. An [[KDE_Core/ISO_Codes|in-depth study] was undertaken and it was concluded there was insufficient overlap between supported fields and translated languages for this to be practical.
The isocodes project was one suggested replacement. An [[KDE_Core/ISO_Codes|in-depth study]] was undertaken and it was concluded there was insufficient overlap between supported fields and translated languages for this to be practical.


The CLDR data files as used in ICU were another suggested replacement, but again there was insufficient overlap between supported fields and translated languages for this to be practical. CLDR has the additional disadvantage of being harder to contribute to, and depending on the context can incur a heavy dependency cost.
The CLDR data files as used in ICU were another suggested replacement, but again there was insufficient overlap between supported fields and translated languages for this to be practical. CLDR has the additional disadvantage of being harder to contribute to, and depending on the context can incur a heavy dependency cost.

Latest revision as of 13:20, 6 July 2014

Introduction

KDE, Gnome, and other Linux desktop environments, as well as applications in general require some fundamental open data to operate. The primary areas are:

  • ISO Codes - Codes representing Language, Country, Script, Currency and other entities, and their names and translations, for example for use in locales, geolocation, file metadata, pick-lists, web APIs etc.
  • Flags - Icons, images or vector files for the national and regional flags, for example for use in pick-lists, educational apps, etc.
  • Holidays - Observance calculation rules and translations for holidays.


There are existing projects internal and external to KDE that try to address these requirements and often duplicate each other in doing so, but they either have known issues or are deprecated in KDE Frameworks 5 (see individual areas detailed below).

It is proposed for KDE to develop a set of open file formats, open data repositories, and open source Qt libraries to support these requirements. Rather than collating and curating our own data sources, we will use Wikidata as the primary source for all the required data and repackaged it into a format suitable for offline usage in a Linux distribution or standalone application installation. Where Wikidata lacks some data fields we require we will add these ourselves, but will work with Wikidata to have them incorporated upstream.

There will be two separate parts to this project:

  • The data files in json format and translation files available in multiple formats.
  • The Qt/KDE C++ libraries that wrap the OpenData files in a convenient API.


This separation will allow distros to package the data separately and allow other desktop projects or applications not written using Qt/C++ to utilise the data without having to install the Qt libraries. It is hoped for these data files to become a shared common resource and so reduce duplication of effort and file space.

The json file format will be defined using json scheme

The initial products are proposed as:

  • OpenCodes - A repository of json format files for each ISO Country, Language, and Currency code entity with important data fields extracted from Wikidata and their translations.
  • OpenFlags - A repository of national and regional flags extracted from Wikidata / Wiki Commons that are installed as a freedesktop.org standard icon set and a collection of svg files.
  • OpenHolidays - A repository of holiday data files, initially generated from the existing KHolidays files.
  • QCodes - A Qt/C++ library for accessing the OpenCodes data.
  • QHolidays - A Qt/C++ library for accessing the OpenHolidays data.


OpenFlags is kept separate from OpenCodes to reduce the size of dependencies for most use cases. The OpenFlags data will mostly be accessed directly by apps using the standard icon theme facilities or loading an SVG directly from disk, but QCodes may provide API to return a QImage or a QPath.

In the future KDE may wish to offer a generic KWikiData library to allow applications such as Marble or KGeography to dynamically access a wider set of data, but this is outside the scope of this project.

ISO Codes

ISO codes for Countries, Languages, Scripts and Currencies are extensively used in software, and any application framework needs to support using them. Some use cases are fundamental, such as locale codes and providing name translations for ui lists. Other use cases are converting between different code formats, for example EXIV2 stores the country code using the Alpha3 code, but geolocation tools usually use the Alpha2 code, so photo apps like Digikam need to know how to convert between the two. Other useful fields are things like TLD codes, phone prefixes and sort names.

KDE only supports those ISO Country and Language codes used within its locale data files. KDE has advanced currency code support for the complete set of codes in the KCurrencyCode class which was designed to support the needs of personal finance programs (and was largely derived form the Wikipedia data boxes). KLocale and KCurrencyCode are dropped in KF5 so KDE requires a replacement for existing use cases.

Qt5 uses CLDR as the source for its ISO code support. Qt has support for the ISO codes used within its locale data files, and provides native language names, but not for the full set of codes or name translations or the extra data fields KDE requires. While the full set of codes may be supported, other translations or data fields will not be, so KDE will need to supplement Qt5's support.

The isocodes project was one suggested replacement. An in-depth study was undertaken and it was concluded there was insufficient overlap between supported fields and translated languages for this to be practical.

The CLDR data files as used in ICU were another suggested replacement, but again there was insufficient overlap between supported fields and translated languages for this to be practical. CLDR has the additional disadvantage of being harder to contribute to, and depending on the context can incur a heavy dependency cost.

Wikidata holds most, if not all the data KDE requires, as well as translations into many languages. What few fields Wikidata may lack can be easily added by KDE and perhaps upstreamed later more easily than the alternatives. KDE can also easily augment the translations using our own systems.

The proposed OpenCodes workflow is:

  • Script to query Wikidata for codes, data fields and translations and to save them as json files in the repo.
  • Ability to generate installation data files and translation files in multiple formats, e.g. json data file with po translation files, json data and json translation files, combined json data and translation files, etc.

Flags

KDE libraries currently installs a set of flag icons about 21x14 pixels in size as part of its locale format files. These flags are often used in country picklists, or sometimes inappropriately in language picklists. With KF5 these flags are deprecated and will no longer be installed.

Some applications like KGeography and Marble need larger size and higher quality flags to display so ship their own separate sets of flags. This duplication of space and effort is rather wasteful and should be shared where possible.

The freedesktop.org Icon Naming standard defines the "International" group for "Icons for international denominations such as flags" which are named "flag-xx" where "xx" is the ISO 3166 country code. Unfortunately very few (if any) icon sets actually implement this as it is a lot of effort.

Wiki Commons provides a fairly comprehensive set of flags in SVG format, and Wikidata provides links to these as data fields for country entities. It is proposed to utilise this in the OpenFlags project:

  • Script to query Wikidata for flag links and download SVG files to repo
  • Script to generate standard icon theme in different sizes from SVG files
  • Distribute as three separate packages for icon theme, SVG base files, and PNG base files


The OpenFlags icon theme will be unthemed, i.e. plain reproductions of the base SVG files and will serve as a fall-back to be used where the default icon theme does not implement flags. Other icon themes will be able to utilise the SVG base files as templates to generate their own themed icons. Applications with the need for larger size flags can choose to use the SVG or PNG base file packages.

Holidays

The iCalendar standard for storing and transmitting calendar events is insufficient to express the rules for calculating occurrences of many public or religious holidays, for example complex calculations like Easter, holidays using alternative calendar systems, or holidays that move if they fall on weekends. This means projects like Mozilla Sunbird have to update their holiday files every year with manually calculated dates, a maintenance nightmare. To address this KDE adapted the PLAN file format to create the KHolidays library to support these advanced calculation rules. This file format is however complex and fragile and implemented using old non-portable technologies. It is proposed to replace KHolidays with a new OpenHolidays file format using json that can be easily utilised by many different projects like KDE Kontact, Mozilla Sunbird, Gnome Evolution, websites, etc.

A presentation given at the Desktop Summit 2011 in Berlin gives some more background. Also see the KHolidays wiki page for more details.