Single byte and multibyte character support

IBM InfoSphere CDC performs single byte character support (SBCS) code page conversions transparently. This means that you do not have to be aware of the code pages that are being used on each system. IBM InfoSphere CDC is able to perform the conversions automatically by examining user configuration parameters.

IBM InfoSphere CDC supports the replication of multibyte character sets (MBCS) such as Japanese or Chinese, which cannot be represented in a single-byte. The most common MBCS implementation is double-byte character sets (DBCS).

The specification for MBCS dictates that data will be applied as is to the mapped column on the target system when you have configured a specific translation. This is possible when the database has a single-byte character set configured (regardless of the actual character set of the data) but this cannot be assured when the character set is multibyte.

IBM InfoSphere CDC will respect the mappings and apply the data according to the configuration set. There will be no validation that the character set can be inserted correctly into the column. You must be aware of the character sets on the database and select the appropriate values when selecting character set translations for their data. When you set an encoding conversion in Management Console, IBM InfoSphere CDC applies the data to the target database in the exact form it was received.

The encoding of solidDB^® character data types depends on the database mode, Unicode or partial Unicode.

Unicode mode (General.InternalCharencoding = utf8)

Partial Unicode mode (General.InternalCharencoding = raw)

▪Character data types use no particular encoding; instead, the data is stored in byte strings with the assumption that user applications are aware of this and handle the conversion as necessary.

When a new instance of IBM InfoSphere CDC for solidDB^® is created, the default encoding is set according to the default solidDB^® database mode which is partial Unicode. By default, the encoding of character data type columns is always set to ISOLatin1.

▪If your database mode is Unicode, you need to set the encoding of character data type columns (CHAR, VARCHAR, and so on) to UTF-8.

▪If your database mode is partial Unicode and your application encoding is not set to ISOLatin1, you need to set the encoding of character data type columns (CHAR, VARCHAR, and so on) to the encoding used in the application environment.

Default (partial Unicode) and Unicode encoding settings for character and wide character data type columns

Column type	Default encoding (partial Unicode)	Required encoding for Unicode databases
Character data types (CHAR, VARCHAR, and so on)	ISOLatin1	UTF-8
Wide character data types (WCHAR, WVARCHAR, and so on)	UTF-16BE	UTF-16BE

Java class user exits in IBM InfoSphere CDC support multibyte character sets (MBCS). Multibyte character sets are converted to Java strings (UTF-16).