Jump to content

Kexi/Design/Discussion About OpenDocument Format Specification for Desktop Databases

From KDE Community Wiki

J. Staniek, June 2005

Further discussions are moved to this wiki page

This page contains discussion about extension the OpenDocument specification for database document type, proposed by OASIS Technical Commitee:

Terms:

  • ODB = OpenOffice.org format for Databases (currently not a part of OpenDocument)
  • MSA = MS Access
  • oo.org = OpenOffice.org

General analysis

  • Is it better for apps like Kexi to have ODB as native format and have simpler features or to have a possibility for importing/exporting ODB and have a way for additional features?
    • OpenDocument specs come from a world where interoperability with MS Office apps is one of the highest requirements, and thus the format become mostly a cast to open space. With ODB such requirement is not so obvious: we've not yet reverse engineered most of MSA's format.
    • moreover, to successfully replace MSA, 1-1, we also need to behave as MSA. That's why importing MSA databases is much easier than dealing with them as our native format.
    • In a real use case, importing-editing-exporting path used by oo.org is not reasonable: we're working with large data sets. See the conclusion below.
    • If we're using external database engines, most notable servers, there's no one-file which format could be fully specified. ODB could be better specified as a set of specifications. Most obvious (orthogonal) specs are for:
      • specification for database schema notation and extended entities like forms, reports and queries
      • connection data: information about external database connection
      • engine description: detailed information about database engine parameters; can be then used to define a connection data, especially to get default values for engine-specific parameters
      • application settings, containing reasonable defaults, specifies behaviour during entire session; this is not tied to a particular connection data; Question: should application settings be a part of OASIS specs? If not, attributes mentioned below, like Suppress Version Columns can be tied to implementation. Certain application settings can be incompatible with implementations, because are coming from APIs like com.sun.star.sdb. If this is the case, could be better not to mention them in the specs?
Current ODB specs contain db-data-source in:
 <define name="office-database">
    <element name="office:database">
      <ref name="office-database-attlist"/>
      [1]
 [...]
what means the specs contain a suggestion about storage method: database metadata is kept together with connection info. Could this be splitted to separate files? They can be always bundled together as an optional "extended" format, for user's convenience.
In the current ODB specs there's a mention about database package file. This silently assumes that description of any custom (from db engine perspective), higher-level objects, like forms or reports is stored within the database package file. We should avoid such assumptions. See note in 4.
  • File-based "databases". Database creation applications are not document driven. We have different storage here: unless we're assuming only small, academic data sets will be only generated, we can't assume database can be put into an envelope built using XML container.
Let's look at Kexi as example implementation: any object, is serialized to a string and stored in a relational database. That's nothing new, as MSA have developed the same rule. Efficiency is one most obvious reason for this.
Main problem is that we're not going to have the same storage format (read: database engine) within every application accepting ODB, hence import/export tools reresent an overhead here.
Conclusion: specs for ODB need to be independent of storage method

Data Source

Connection resource

Current specs say:

 The xlink:href attribute indicates a database URL,
 which locates a database driver.

Current OpennOffice.org database layer can use .jar archives downloaded by pointing an URL, however this is pretty orthogonal to what the specs are developed for. I found Connection resource section as implementation-dependent and problematic. For example:

  • certain database applications are more monolitic, with all drivers delivered within a package, and there's no possibility for downloading drivers
  • other database applications rely on operating system packaging information, i.e. DEB of RPM
  • database applications can rely on higher level library frameworks like KDE's Library Loader, with it's own introspection and and driver modules' lookup.

Summing up:

  • there's a danger that this part of the spec is derived from Java world (e.g. db:java-driver-class attribute).
  • it also looks like MS Access' hardcoded library references, very dangerous and not portable setting, which should be stored (if as all) in a separate file/place. MS Access users have had many troubles because of a need for tweaking such settings when a database file have to be used on another computer (even within the same version of the software).

Suppress Version Columns

Could this setting be moved to application settings? It looks too much find-grained, in most cases user may want to define this flag at global level or for particular database drivers.

Extension

Could this setting be moved to application settings? Moreover, for now it overrides desktop environment's settings regarding extensions. Is it ok that there's no mention about mime type system.

Is First Row Header Line

Tied to storage method (more specifically, defines how table schema is returned within a connection), hence makes little sense within db-data-source-attlist. Can make sense if this is used with CSV format, otherwise we cannot even say whether a row is first or not...

Show Deleted

We're assuming that 'delete row' action is implemented as marking records as deleted (probably by marking a boolean bit of a record, at high-level). This can be storage-independent, but introduces problems with support for queries with JOIN or GROUP BY clause: we won't be able to rely on a database engine's query exectutor without regenerating SQL statements. Only some engines, like DBF, can provide API function for showing deleted records.

This flag is reasonable if it means for showing deleted records at GUI level, i.e. marging them, say, crossed out with red line or so. In this case the flag could go to application settings' specs.

Is Table Name Length Limited

Note for this and similar attributes: what if we're moving connection data to different place and use them with a db server having slightly different capatibilities? Suggestion is that we may want to have such attributes defined at application (or even desktop) level, and that can provide defaults for every connection data we're using. Thus, "Is Table Name Length Limited" can be left empty and default can be retrieved from global settings.

System Driver Settings, and other

Also: Enable Sql92 Check, Append Table Alias Name, Parameter Name Substitution, Ignore Driver Privileges, Boolean Comparison Mode, Use Catalog, .... Like "Is Table Name Length Limited": driver-specific and engine version-specific.

Login

OK

Is Password Required

Like 2.6: driver-specific and engine version-specific.

Delimiter

Note: This setting is more for CSV-like data sources than file-based databases in general. For example, communication with file-based engines like SQLite is similar to communication with server engines, so delimiter element could not be needed there.

Defaults for Field, String, Decimal, Thousand is highly engine-dependent, so could be copied from engine description.

Font Charset (Encoding)

Like "Is Table Name Length Limited": driver-specific and engine version-specific. Note: some engines can have compile-time, hardcoded deault encodings.

Table Filter, Table Filter Pattern

Like "Is Table Name Length Limited": driver-specific and engine version-specific. It's db engine specific, how we can retrieve database schema. Table patterns can be used, but it's not always possible to use just SQL to retrieve the schema -- specific db access API can provide it's own function to do so.

Table Type

All three TABLE, VIEW, SYSTEM_TABLE types are usable. We can also add a type for high-level system table type -- in case of this is a group of kexi_ _* tables used for metadata storage. For forward compatibility we could allow to define a type using patterns, like "kexi_ _*".


Auto Increment

In Kexi, hadling auto increment values is hardcoded in Kexi-level db driver itself. Low-level database access APIs are not changing frequently so don't need a possibility for configuring auto increment settings in engine descriptions.

Additional Column Statement, Row Retrieving Statement

See 3.5. it's hard to imagine such setting can be reused after a driver speci These settings look specific for OO.org's database layer. Other db layers, like KexiDB can provide it's own settings, no compatible with, say, OO.org.


Data Source Settings

Useful as an extension for common connection settings. In fact, many of above settings like Table Filter can be moved here.

DB objects in general

(table schemas, query schemas, forms, reports)

This is a part of specification for database schema format. This XML format is not necessary stored as-is: it is only specified for schema data exchange.

Note: We are not assumming table schemas are available from engine nor forms definitions are stored in the same place as table data. Thus, the XML format can be considered as intermediate one for database project's schema exchange.

Titles

Proposed extension: title (also suitable for any other object types like reports, queries and forms). Unlike name, which is ofter latin1 string, title is a user-visible string and can contain any characters (using \n is not reasonable though). Can be used with report generation or autoforms, so user is not forced to enter a table title again and again.

 <define name="common-db-table-name-attlist" combine="interleave">
  <optional>
    <attribute name="db:title">
      <ref name="string"/>
    </attribute>
  </optional>
 </define>

Proposed extension: description, see notes for title. Can be used to fully describe an object with information for developers and users. It is similar to db:description within db-table-attlist element.

 <define name="common-db-table-name-attlist" combine="interleave">
  <optional>
    <attribute name="db:description">
      <ref name="string"/>
    </attribute>
  </optional>
 </define>

Note: There's already db:name attribute used within queries, forms and reports but we also need to have identifier (latin1 name) available for such types of objects, so they can be easily and clearly referenced in by a scripts. So the proposal is to have db::name for real latin1 name (identifier) and db:title for title.

Tables

Type Name

Can db:type-name be defined at engine description level, not at table schema level?

Is Empty Allowed

Proposal: Addional boolean flag -- only meaningful for data dypes where length property exists, i.e. various strings and CLOBs. If true, empty (not necessary null!) values are allowed. Is Nullable == true implies Is Empty Allowed == true. Implementations like Kexi and MSA provide this flag. Note that Is Empty Allowed contratint is usually enforced at client application level, not at db engine level.

Visibility

What are exactly collapse and filter values in db:visibility?

Queries

Master Table within a Query

Update Table is a table which will be updated when the data of the query is modified; within Kexi it's called Master Table, as only master side of a SQL join ca nbe editable. (if a query only references only one table, it's master table). The proposal is to rename Update Table term to Master Table.

Table and Query (View) Formatting Properties

Proposed aditional properties. These will be most probably defined within styles.

  • line color (color)
  • background color (color)
  • text color (color)
  • horizontal line (boolean)
  • vertical line (boolean)
  • font (font info)
  1. Cite error: Invalid <ref> tag; no text was provided for refs named db-data-source