Jump to content

Kexi/Migration: Difference between revisions

From KDE Community Wiki
Jstaniek (talk | contribs)
Jstaniek (talk | contribs)
No edit summary
 
(12 intermediate revisions by 2 users not shown)
Line 1: Line 1:
__TOC__
Data Migration And Sharing.
==Add support for inserting CSV data into an existing table==
*Status: TODO
*{{wish|305505}}


===Rationale===
{{Note|TODO: Move content from [http://kexi-project.org/wiki/wikiview/[email protected]]}}
Inserting (or rather appending) CSV data into an existing table gives a simple means for merging CSV data chunks in one step.


===Assumptions & Requirements===
*[[/General/]]
*The data should be appended to the existing table, without changes made to the
*[[Kexi/Junior_Jobs#Migration|Junior Jobs]]
*Data is in largely the same format (regarding column or data types) as the data used to create the original table, to achieve this:
**Data types for columns should be taken from the schema of the destination table
**Column names stay untouched in the destination table, however autodetection should be still used to figure out value of the "First row contains column names" option; the option can be changed in the


*Conflicts resolutions:
==File formats==
**Data type violations (for example when importing string when numeric value is expected): TODO - for 2.6 we need to have friendly error message at least
*[[/CSV/]]
***Future: propose skipping conflicting values
*[[/MDB/]]
**Primary key violations: TODO - for 2.6 we need to have friendly error message at least
*[[/ODB/]]
***Future: propose skipping records with conflicting values or replacing the old records with new ones
*[[/Spreadsheet/]]
**Future: unique key violations: TODO
*[[/SQLite/]]
*[[/Fixed Width Text/]]


===GUI Design===
==Servers==
Existing GUI is based on single dialog. Switching to KAssistantDialog is needed.
*[[/MySQL/]]
*[[/PostgreSQL/]]


==Porting to non-modal assistants==
=Details=
*Status: TODO
There are two kinds of data migration:


'''What is involved in porting to the new gui?''' <jstaniek>: We sit and design the given assistant and I am helping with using the API and possibly extending/fixing it. The difference is that many imports can be started concurrently so some extra checking may be needed but this is a side note. Also e.g. user could try to edit/remove a source table during export. So this is fun. But after the change the GUI would be a lot more natural I think.
* Migration for entire projects.
* Data migration related to a single data table (importing, exporting).
 
== Project Migration ==
Project migration allows to migrate the database schema and table data in one go. It's currently possible to migrate a MySQL, PostgreSQL or MS Access database to a SQLite database, using either the keximigratetest program or using [https://userbase.kde.org/Kexi/Handbook/References/Menu_Commands/The_File_Menu File->Import Database] menu command.
 
See also: Migration API
 
== Data Migration ==
Text tabular data can be imported from file (Data->Import Data From File), exported to file (Data->Export Data to File) or pasted from the clipboard (Edit->Paste Special) or copied to the clipboard (Edit->Copy Special).
 
=== Text formats for tabular data ===
There are a number of text formats for tabular data. See [https://web.archive.org/web/20081222042506/http://www.icehouse.net/jim_d/excel3.html#Formats MS Excel Import formats for naming scheme].
 
* '''Comma Separated Values (CSV) Data'''
* '''Fixed Width Text'''
 
See also http://en.wikipedia.org/wiki/Flat_file_database
 
=== Text formats for tabular data ===
 
* '''Comma Separated Values (CSV) Data'''
* '''Fixed Width Text'''
 
== Data Sharing ==
By '''Data Sharing''' we usually mean dynamic data exchanging features that do not require to importing and exporting. For example: connecting with external (not KDb-derived) database to share some data between KEXI project and that external data source.
 
== Cross-DB-Engine Queries ==
  jstaniek> mart: btw, another hard way is to add simple querying
  engine on top of KDb (not so hard since we've got schemas parsed
  out  of query statements) and retrieve simple one-table records from
  mdbtools and then combine them using our engine. Advantage:
  we can later even provide a way for making relations for loosely
  connected data sources, as eg. , say, google result set and mysql
  dictionary
 
  mart> jstaniek: I have had some ideas about 'partial evaluation'
  of SQL before partial evaluation - so I could write SELECT int1, int2,
  str1,str2 FROM t WHERE int1<int2 and string_edit_distance(str1,
  str2) < 5, so the DB goes away and finds records where int1<int2
  then KEXI would execute some plugin-loaded function to filter
  the rest etc...
 
  jstaniek> btw, it is great tool for data integration... only not
  so robust...  but who cares... that's ALMOST ready!
 
  mart> jstaniek: it is?
 
  jstaniek> mart: we've got a lists of builtin (provided at the backend)
  functions, so you iterate through
  PARSE tree of the query and ... split the query to "backed" and
  "frontend" parts...
  (yes, we know these things could be nicer on server level, not client...)
  mart, jan 2005
 
Speaking of which, it might not be a bad idea to store these functions in Java JAR files. JNI or CNI calls could be made to these functions using GCC's Java support (if available). The JAR could also be deployed on a PosgreSQL server with pljava, allowing the functions to be executed on the server for greater performance.
 
=== By the way: a word from MSA team ===
 
From [https://web.archive.org/web/20070203130919/http://blogs.msdn.com/access/archive/2005/10/13/480870.aspx Access 12's new data engine] article:
 
[..] ''Jet is unique in Microsoft because it provides both a data storage mechanism with a heterogeneous query processor. This ability to run queries across a number of different data stores is a key reason people use Access, so it was super important for us to retain that ability going forward. The SQL team is de-investing in Jet, and telling developers to code against SQL Server Express, but as it lacks the heterogeneous query functionality we needed, Access stuck with Jet.'' [..]
 
=== Links ===
 
* VistaDB: Migration Wizard Screnshots - usable to learn what options are used when migrating real-world projects.

Latest revision as of 13:48, 29 December 2024

Data Migration And Sharing.

Note

TODO: Move content from [1]


File formats

Servers

Details

There are two kinds of data migration:

  • Migration for entire projects.
  • Data migration related to a single data table (importing, exporting).

Project Migration

Project migration allows to migrate the database schema and table data in one go. It's currently possible to migrate a MySQL, PostgreSQL or MS Access database to a SQLite database, using either the keximigratetest program or using File->Import Database menu command.

See also: Migration API

Data Migration

Text tabular data can be imported from file (Data->Import Data From File), exported to file (Data->Export Data to File) or pasted from the clipboard (Edit->Paste Special) or copied to the clipboard (Edit->Copy Special).

Text formats for tabular data

There are a number of text formats for tabular data. See MS Excel Import formats for naming scheme.

  • Comma Separated Values (CSV) Data
  • Fixed Width Text

See also http://en.wikipedia.org/wiki/Flat_file_database

Text formats for tabular data

  • Comma Separated Values (CSV) Data
  • Fixed Width Text

Data Sharing

By Data Sharing we usually mean dynamic data exchanging features that do not require to importing and exporting. For example: connecting with external (not KDb-derived) database to share some data between KEXI project and that external data source.

Cross-DB-Engine Queries

 jstaniek> mart: btw, another hard way is to add simple querying
 engine on top of KDb (not so hard since we've got schemas parsed
 out  of query statements) and retrieve simple one-table records from
 mdbtools and then combine them using our engine. Advantage:
 we can later even provide a way for making relations for loosely
 connected data sources, as eg. , say, google result set and mysql
 dictionary
 mart> jstaniek: I have had some ideas about 'partial evaluation'
  of SQL before partial evaluation - so I could write SELECT int1, int2,
  str1,str2 FROM t WHERE int1<int2 and string_edit_distance(str1,
  str2) < 5, so the DB goes away and finds records where int1<int2
  then KEXI would execute some plugin-loaded function to filter
  the rest etc...
 jstaniek> btw, it is great tool for data integration... only not
  so robust...  but who cares... that's ALMOST ready!
 mart> jstaniek: it is?
 jstaniek> mart: we've got a lists of builtin (provided at the backend)
  functions, so you iterate through
  PARSE tree of the query and ... split the query to "backed" and
  "frontend" parts...
  (yes, we know these things could be nicer on server level, not client...)
  mart, jan 2005

Speaking of which, it might not be a bad idea to store these functions in Java JAR files. JNI or CNI calls could be made to these functions using GCC's Java support (if available). The JAR could also be deployed on a PosgreSQL server with pljava, allowing the functions to be executed on the server for greater performance.

By the way: a word from MSA team

From Access 12's new data engine article:

[..] Jet is unique in Microsoft because it provides both a data storage mechanism with a heterogeneous query processor. This ability to run queries across a number of different data stores is a key reason people use Access, so it was super important for us to retain that ability going forward. The SQL team is de-investing in Jet, and telling developers to code against SQL Server Express, but as it lacks the heterogeneous query functionality we needed, Access stuck with Jet. [..]

Links

  • VistaDB: Migration Wizard Screnshots - usable to learn what options are used when migrating real-world projects.