Merging Japanese (multibyte) data sources
The UNICOM Intelligence Developer Documentation Library (DDL) contains two sample data management scripts that demonstrate the merging of data sources that contain Japanese-language surveys and Japanese-language case data.
The data management scripts are JapaneseMergeHorizontal.dms and JapaneseMergeVertical.dms. The default location is:
[INSTALL_FOLDER]\IBM\SPSS\DataCollection\7\DDL\\Scripts\Data Management\DMS
The default location of the input data sources is:
[INSTALL_FOLDER]\IBM\SPSS\DataCollection\7\DDL\Data\Japanese Data
The question and category names in these data sources are in Japanese as are all label texts and all responses to text questions. The surveys are based on that for the museum sample data set that is supplied with the DDL.
By running these examples, you can see that both metadata and case data are merged successfully even when question and category names are defined in a multibyte language such as Japanese.
If you do not see Japanese characters in the following examples, you might need to install files for East Asian languages on your computer. You can normally do this from Regional and Language Options (or Regional Options) in Control Panel.
Vertical merge example
The JapaneseMergeVertical.dms script performs a vertical merge of two data sources that contain data for different respondents, such as when different offices have carried out the same survey. The metadata for the two data sources is similar, but not identical. The differences in the metadata are described in the following table.
Question name
|
Differences
|
Age
|
年齢種別
|
In the second data source, this question has an additional category, which is a NA (no answer) special response.
|
Certificate
|
生物学の資格種別
|
This question exists only in the master data source.
|
Country
|
国
|
This question exists only in the second data source.
|
Expect
|
期待
|
In the master data source, this question has an additional category, which is named Reacquaint (知 識の再整理).
|
Museums
|
博物館種別
|
In the second data source, this question has additional categories, which are named Transport (交通手段) and ModernArt (近代美術).
|
The data management script includes both a Metadata section and an OnNextCase section, which are used to ensure that the output data does not contain duplicate values of the system variable
Respondent.Serial. For a description of the technique used, see
Running a vertical merge.
When you run JapaneseMergeVertical.dms you should find that the output metadata document contains all questions and categories from both input data sources. For question Certificate (生 物学の資格種別), the output case data will contain nulls for those cases originating from the second input data source. For question Country (国), the output case data will contain nulls for those cases originating from the master input data source.
Horizontal merge example
The script JapaneseMergeHorizontal.dms performs a horizontal merge of two data sources that contain data for mostly the same respondents. A full join is performed, that is, all cases are written to the output data regardless of whether a matching case was found on the other input data source.
The metadata for the two data sources is different, that is, the master data source contains questions that were asked as the respondents entered the museum and the second data source contains questions that were asked as the respondents left the museum.
The cases are joined using the value of the system variable Respondent.Serial. The differences in the number of respondents are as follows:
▪Respondents 2, 8, and 18, exist only on the master data source.
▪Respondents 4 and 19 exist only on the second data source.
▪Respondent 13 does not exist on either data source.
When you run JapaneseMergeHorizontal.dms you should find that the output metadata document contains all the questions from both input data sources. For those respondents that did not participate in both questionnaires, the output case data will contain nulls for those questions that were not asked.
See