Working with Unicode

solidDB Help : Programming : Working with Unicode

solidDB supports the Unicode standard, providing the capability to encode characters that are used in the major languages of the world. To use Unicode-encoded data, you do not need to use any non-standard or solidDB-specific implementations for application development; standard ODBC API or JDBC API can be used, as well as solidDB tools. solidDB also supports heterogeneous, multi-client environments where each application can be set to use different encoding.

Unicode database modes

Starting from solidDB version 6.5, solidDB databases can be created in two modes: Unicode mode or partial Unicode mode. This database mode is based on the encoding of character data types (CHAR, VARCHAR, and so on) in the solidDB server. Wide-character data types (WCHAR, WVARCHAR, and so on) are Unicode-encoded in both modes.

▪ Unicode mode

In the Unicode mode, the internal representation for character data types is UTF-8.

The internal representation for wide-character data types is UTF-16.

▪ Partial Unicode mode

In the partial Unicode mode, the internal representation for character data types uses no particular encoding; instead, the data is stored in byte strings with the assumption that user applications are aware of this and handle the conversion as necessary.

The internal representation for wide-character data types is UTF-16.

Note Databases created with solidDB version 6.3 or earlier are in partial Unicode mode. The default database mode in solidDB version 6.5 is partial Unicode.

Unicode applications can be built on both Unicode and partial Unicode databases. However, the instructions in the following topics assume that the database mode is Unicode.

Key features of solidDB Unicode databases

▪ Storing and retrieving of Unicode data

The internal of representation of Unicode data is based on UTF-8 and UTF-16 encoding. Data in wide-character column types is represented internally in UTF-16 and data in character column types is represented in UTF-8.

This means that both single and multi-byte data can be stored in character column types; if mainly multi-byte data is expected, you can optimize space-efficiency by choosing to store the multi-byte data in wide-character column types.

▪ No restrictions on the encoding used in the applications

solidDB ODBC/JDBC drivers handle the conversion of data between the application encoding and the UTF-8/UTF-16 format in the solidDB server.

▪ Standard ODBC API and JDBC API available for application development

There are no non-standard or solidDB-specific requirements for application development; you can use standard ODBC or JDBC APIs.

See

What is Unicode?

Designing Unicode databases

Using solidDB tools with Unicode

Compatibility between Unicode and partial Unicode databases

Developing applications for Unicode