Language handling by the SPSS Statistics SAV DSC

Texts are stored in IBM SPSS Statistics .sav files using a different encoding system from the UNICOM Intelligence Data Model. The UNICOM Intelligence Data Model uses Unicode whereas IBM SPSS Statistics uses multibyte-character set (MBCS) encoding. This means that texts are converted from one encoding system to the other when you read and write .sav files.

This conversion is generally invisible and problem-free. However, problems might sometimes arise when you are working with a language that is in a different “family of languages” from the default language set for your computer. For example, if your default language is one of the Western European languages, you should not have any problems working with any other Western European languages, but you might have problems working with Japanese or other multibyte languages. However, if you change your default language to Japanese, you should be able to work with Japanese without problems. A second situation where you might find problems is when you work with multiple language families. In this situation, you must set an option to support multiple language families, and set IBM SPSS Statistics to read the .sav file in a similar way.

Technically, a “family of languages” is known as a code page. And when you set a language as the default for your computer, you automatically select that language's code page as the default. You select the default language for your computer in the Regional and Language Options (or Regional Options) dialog in Control Panel.

When reading a .sav file, the SPSS Statistics SAV DSC gets information about which code page to use from one of three possible sources, in the following order.

▪From the SavLanguage property. In addition, if the language uses multiple code pages, the SPSS Statistics SAV DSC obtains the code page from the SavCodePage property, if defined.

When generating an MDM document from the .sav file, the SPSS Statistics SAV DSC adds the language specified by the SavLanguage setting to the list of languages in the generated MDM document, unless you set the MdmLanguageNoAdd property to a non-zero value. If the SPSS Statistics SAV DSC adds the language to the generated MDM document, it makes it the current language unless you set the MdmLanguageCurrentNoChange property to a non-zero value.

▪From the MR Init Input Locale connection property. The SPSS Statistics SAV DSC uses the value of this connection property, if it has been set and has a non-zero value.

Otherwise, the default code page is used.

When the SPSS Statistics SAV DSC writes to a .sav file and uses a code page other than the default code page, it creates a .ini file defining the language used.

To support writing a .sav file with multiple language families, set SavCodePage to 65001 (UTF-8) in the output connection string, and set SavLanguage to ENU. This enables non-Western characters to display correctly in IBM SPSS Statistics, if the Unicode mode is enabled in IBM SPSS Statistics and the SavLanguage is specified. For example:

An invalid language setting can occur, for example, when the SavLanguage setting defines a language that is not supported on your computer. By default, an invalid language setting will cause the SPSS Statistics SAV DSC to issue an error message and stop. To specify that the should ignore an invalid language setting, set the StrictLanguage property to 0. Be aware that by changing the default behavior, the SPSS Statistics SAV DSC will continue and produce garbled text when the language setting is invalid.