Merging Japanese (multibyte) data sources

The UNICOM Intelligence Developer Documentation Library (DDL) contains two sample data management scripts that demonstrate the merging of data sources that contain Japanese-language surveys and Japanese-language case data.

The question and category names in these data sources are in Japanese as are all label texts and all responses to text questions. The surveys are based on that for the museum sample data set that is supplied with the DDL.

By running these examples, you can see that both metadata and case data are merged successfully even when question and category names are defined in a multibyte language such as Japanese.

If you do not see Japanese characters in the following examples, you might need to install files for East Asian languages on your computer. You can normally do this from Regional and Language Options (or Regional Options) in Control Panel.

The JapaneseMergeVertical.dms script performs a vertical merge of two data sources that contain data for different respondents, such as when different offices have carried out the same survey. The metadata for the two data sources is similar, but not identical. The differences in the metadata are described in the following table.

Question name		Differences
Age	年齢種別	In the second data source, this question has an additional category, which is a NA (no answer) special response.
Certificate	生物学の資格種別	This question exists only in the master data source.
Country	国	This question exists only in the second data source.
Expect	期待	In the master data source, this question has an additional category, which is named Reacquaint (知識の再整理).
Museums	博物館種別	In the second data source, this question has additional categories, which are named Transport (交通手段) and ModernArt (近代美術).

The data management script includes both a Metadata section and an OnNextCase section, which are used to ensure that the output data does not contain duplicate values of the system variable Respondent.Serial. For a description of the technique used, see Running a vertical merge.

When you run JapaneseMergeVertical.dms you should find that the output metadata document contains all questions and categories from both input data sources. For question Certificate (生物学の資格種別), the output case data will contain nulls for those cases originating from the second input data source. For question Country (国), the output case data will contain nulls for those cases originating from the master input data source.

The script JapaneseMergeHorizontal.dms performs a horizontal merge of two data sources that contain data for mostly the same respondents. A full join is performed, that is, all cases are written to the output data regardless of whether a matching case was found on the other input data source.

The metadata for the two data sources is different, that is, the master data source contains questions that were asked as the respondents entered the museum and the second data source contains questions that were asked as the respondents left the museum.

The cases are joined using the value of the system variable Respondent.Serial. The differences in the number of respondents are as follows:

When you run JapaneseMergeHorizontal.dms you should find that the output metadata document contains all the questions from both input data sources. For those respondents that did not participate in both questionnaires, the output case data will contain nulls for those questions that were not asked.