Universal Cache User Guide : IBM InfoSphere CDC for solidDB® reference : About IBM InfoSphere CDC : Single byte and multibyte character support
  
Single byte and multibyte character support
IBM InfoSphere CDC supports replication of both single byte and multibyte character sets.
Single byte character support
IBM InfoSphere CDC performs single byte character support (SBCS) code page conversions transparently. This means that you do not have to be aware of the code pages that are being used on each system. IBM InfoSphere CDC is able to perform the conversions automatically by examining user configuration parameters.
Multibyte character support
IBM InfoSphere CDC supports the replication of multibyte character sets (MBCS) such as Japanese or Chinese, which cannot be represented in a single-byte. The most common MBCS implementation is double-byte character sets (DBCS).
The specification for MBCS dictates that data will be applied as is to the mapped column on the target system when you have configured a specific translation. This is possible when the database has a single-byte character set configured (regardless of the actual character set of the data) but this cannot be assured when the character set is multibyte.
IBM InfoSphere CDC will respect the mappings and apply the data according to the configuration set. There will be no validation that the character set can be inserted correctly into the column. You must be aware of the character sets on the database and select the appropriate values when selecting character set translations for their data. When you set an encoding conversion in Management Console, IBM InfoSphere CDC applies the data to the target database in the exact form it was received.
Implications for multibyte character support in solidDB® databases
The encoding of solidDB® character data types depends on the database mode, Unicode or partial Unicode.
Unicode mode (General.InternalCharencoding = utf8)
Character data types (CHAR, VARCHAR, and so on) are stored in UTF-8.
Wide character data types (WCHAR, WVARCHAR, and so on) are stored in UTF-16.
Partial Unicode mode (General.InternalCharencoding = raw)
Character data types use no particular encoding; instead, the data is stored in byte strings with the assumption that user applications are aware of this and handle the conversion as necessary.
Wide character data types are stored in UTF-16.
When a new instance of IBM InfoSphere CDC for solidDB® is created, the default encoding is set according to the default solidDB® database mode which is partial Unicode. By default, the encoding of character data type columns is always set to ISOLatin1.
If your database mode is Unicode, you need to set the encoding of character data type columns (CHAR, VARCHAR, and so on) to UTF-8.
If your database mode is partial Unicode and your application encoding is not set to ISOLatin1, you need to set the encoding of character data type columns (CHAR, VARCHAR, and so on) to the encoding used in the application environment.
Column type
Default encoding (partial Unicode)
Required encoding for Unicode databases
Character data types (CHAR, VARCHAR, and so on)
ISOLatin1
UTF-8
Wide character data types (WCHAR, WVARCHAR, and so on)
UTF-16BE
UTF-16BE
User exits and multibyte character sets
Java class user exits in IBM InfoSphere CDC support multibyte character sets (MBCS). Multibyte character sets are converted to Java strings (UTF-16).
See also
About IBM InfoSphere CDC