KDb/Database Collation Issues
Introduction
Types
Collation type is a rule used when comparing (or sorting) text strings in a database. Lets assume we have three database records in a name column of a table:
Joe Adam joan
- NOCASE collation means that a comparison like name > "adam" will return:
Joe Adam joan
- In other words, case-insensitive comparison is performed.
- BINARY collation means that a comparison like name > "adam" will return:
joan
- Case-sensitive comparison is performed here, so neither "Joe" nor "Adam" are greater than "adam".
Affected SQL Operators
Following SQL comparison operators are affected:
< > <= >= <> != = == LIKE.
Implications For Data Integrity
Example for MySQL:
CREATE TABLE dictionary ( word varchar(255) COLLATE latin2_bin NOT NULL, ); ALTER TABLE dictionary ADD UNIQUE INDEX (word);
Having BINARY collation latin2_bin defined now it's possible to insert
Bag bag
while, with NOCASE collation, the unique index would not allow this.
Problem with Non-latin1 Character Sets
Database backend needs to offer built-in support for Non-latin1 character collations. Kexi could not implement a workaround at the client side if a query has to be executed at the server side.
Collation in SQLite Backend
SQLite supports BINARY collation by default.
Non-latin1 Character Sets: we can patch SQLite to add such a support. See User-defined Collation Sequences.
Collation in MySQL Backend
By default, MySQL searches are not case sensitive.
To force BINARY collation, we will add a COLLATE clause, e.g.:
SELECT * FROM table WHERE name>='abc' COLLATE latin1_bin
Notes: latin1_bin collation name always works. Collation and character sets can be properly selected on db creation or connecting only for MySQL >= 4.1. See also Table Character Set and Collation (mysql manual). So for newer MySQL versions use specific collations, not just latin1_bin.
A good default for collation/character set seems to be utf8_general_ci/utf8.
TODO
Collation in PostgreSQL Backend
TODO
Implications for Query Designer
A good default is a #1 priority here. In most cases users expect NOCASE collation to be the default, so we're:
- adding NOCASE flag to appropriate KDbBinaryExpr objects when SQL statement is generated/anaysed
- supporting BINARY keyword in the Design View of the Query Designer