Troubleshooting when reading from a .sav file

When I export my UNICOM Intelligence Interviewer data to a .sav file and open it in UNICOM Intelligence Reporter, it is sometimes difficult to link the IBM SPSS Statistics variables to the questions and categories in the questionnaire. Can you give me any tips?

When using UNICOM Intelligence Reporter to analyze data in a .sav file that originated in UNICOM Intelligence Interviewer, it is always preferable to access the data using the .mdd file used for the export. When you do this, the variables that you see in UNICOM Intelligence Reporter match those in the questionnaire definition (.mdd) file and not those in the .sav file. For example, if you set the maximum length for IBM SPSS Statistics variable names to eight bytes (for compatibility with older versions of IBM SPSS Statistics), the algorithm used to generate an eight-byte IBM SPSS Statistics name from a Metadata Model (MDM) name might make it difficult for you to link the two names. Using the .mdd file to access the .sav file avoids this problem.

There are two ways in which you can speed up the time it takes to load a .sav file into UNICOM Intelligence Reporter. For the best performance, use both of these methods:

▪Create an .ini file that the DSC can read (see Properties and settings used by the SPSS Statistics SAV DSC), and include the following line in that file:

This setting stops the IBM SPSS Statistics SAV DSC from scanning all cases in the .sav file and generating categories for values that do not have corresponding value labels. Note, however, that this setting means that the IBM SPSS Statistics SAV DSC will silently ignore any unlabeled values in the case data. For more information, see Determining categories.

▪Create a .mdd file from the .sav file. In UNICOM Intelligence Reporter, use the .mdd file as the metadata source. For more information, see Creating an MDM document from an SPSS Statistics.sav file.

When you do not use an .mdd file, the IBM SPSS Statistics SAV DSC creates an MDM document "on the fly" and for a .sav file that contains many variables, this can take quite a long time. In addition, if you have not set ScanForCategories = 0, the number of cases in the .sav file will also affect how long it takes to create the MDM document. Using a .mdd file cuts out this step.

If you are using both methods, make sure that you create the .ini file and add the ScanForCategories setting before you create the .mdd file. The .ini file must still exist when you load the .sav file into UNICOM Intelligence Reporter.

An IBM SPSS Statistics numeric variable has been interpreted as an MDM single-response categorical variable, but there are less MDM categories than I expected.

The most probable reason is that the IBM SPSS Statistics numeric variable has been only partially coded and you have changed the default behavior of the SPSS Statistics SAV DSC so that it no longer scans the case data and generates categories for data values that do not have a corresponding value label. Either modify the IBM SPSS Statistics variable by adding value labels for all data values in the .sav file or make sure that the value of the ScanForCategories property is ‑1.

For more information, see Determining categories.

If you are reading a .sav file that was created in UNICOM Intelligence Data Model 2.9 or earlier, you need to read it using the MDM document (.mdd) file that was created when you created the .sav file. This will allow the SPSS Statistics SAV DSC to reconstruct the original MDM text variable that was split up because of the 255 byte limit that IBM SPSS Statistics 12 or earlier places on string variables.

You can do this in one of two ways:

▪Open the .sav file in IBM SPSS Statistics and add value labels to the numeric variable. The SPSS Statistics SAV DSC will then generate an MDM category for each value label. For more information, see Tabulating IBM SPSS Statistics variables in UNICOM Intelligence Reporter.

▪Use the CategoricalVariables property to force the SPSS Statistics SAV DSC to map the IBM SPSS Statistics variable to an MDM single-response categorical variable. For more information, see Determining categorical variables.

If you are using an .mdd file to access the data in your .sav file (as suggested in the response to the first question in this topic), you must change the IBM SPSS Statistics variable or set the CategoricalVariables setting before creating the .mdd file from the .sav file. If you make further changes to the .sav file after creating the .mdd file, you must create a new .mdd file.

In my .sav file I have several variables that represent the responses to a multiple response question. How can I make them appear as one MDM multiple-response categorical variable in UNICOM Intelligence Reporter?

Provided you have the IBM SPSS Statistics Tables add-on module, you can use IBM SPSS Statistics to define a multiple response set that contains all of the relevant IBM SPSS Statistics variables. The SPSS Statistics SAV DSC will then map the multiple response to an MDM multiple-response categorical variable. For more information, see "IBM SPSS Statistics Multiple Response Sets" in Variable definitions when reading from a .sav file.

The SPSS Statistics SAV DSC can handle overlapping multiple response sets provided that the sets are either all multiple category sets with the same label values or all multiple dichotomy sets that have the same counted value (that is, the value that indicates a "yes" response). If defining your multiple response sets in this way does not resolve the problem, try reducing the number of overlapping multiple response sets, or replace all of the existing sets with one multiple response set that includes all of the variables.

When tabulating a .sav file, why does the base calculation for a multiple-response question include every respondent in the survey? It should exclude respondents who weren't asked the question

If all the IBM SPSS Statistics variables in a multiple category set (but not a multiple dichotomy set) contain IBM SPSS Statistics system-missing values, the SPSS Statistics SAV DSC sets the value of the corresponding MDM variable to an empty categorical value, rather than a NULL value. This causes the case to be included in the base calculation.

For more information about how to resolve this problem, see When the response set contains system-missing values.

The variable names that you see depend on how you have read the .sav file. If you read the .sav file using an .mdd file, the variable names will match those in the .mdd file. If you read the .sav file by specifying the SPSS Statistics SAV DSC as the MDSC, the variable names will be closer to the IBM SPSS Statistics variable names, but the names will be changed, if necessary, to make them valid MDM variable names. If the .sav file was originally created by the SPSS Statistics SAV DSC, the names will match the names in the .mdd file that was created at the same time only if you read the .sav file using that .mdd file.

The MDM representation of the data returned by the SPSS Statistics SAV DSC does not display the raw category values because in many cases it is just not possible to do so. The MDM supports integer category values only, whereas the raw category values in the .sav file can also be real (decimal) numbers and text values. In addition, the .sav file might contain categorical variables that are represented by multiple dichotomy sets, where the raw values for the variables in the set indicate either a True or False selection. However, if a raw category value is numeric (integer or decimal), you can set the corresponding MDM category's Factor property (see IElement.Factor in the MDM Object Model Reference) to the raw value by using the ImportFactors property.

By default, the SPSS Statistics SAV DSC returns category values as MDM unique category values. However, they can optionally be displayed as the IBM SPSS Statistics native value (an index in the range of 1 to the number of categories in the variable), which the SPSS Statistics SAV DSC uses to identify categories in the .sav file. IBM SPSS Statistics native values are not necessarily the same as the raw values stored in the .sav file. For multiple dichotomy sets, the native value identifies the relevant variable in the set. For other categorical variables, the native value identifies the value label that represents the category. If you want to use the IBM SPSS Statistics native values, set the MR Init Category Values connection property (see Connection properties) to 1.

I'm getting the following error when I read an IBM SPSS Statistics.sav file that contains Chinese, Japanese, or Korean text: "The 's language '<language>' isn't supported by the IBM SPSS Statistics DLL".

This usually means that you do not have the required code page for that language installed on your computer. Because the stored texts in a .sav file use multibyte encoding rather than Unicode, the SPSS Statistics SAV DSC must convert the stored texts to Unicode so that they can be understood by the UNICOM Intelligence Data Model. To do this, the must have access to the code page for the language that needs to be converted. Use the Regional and Language Options (or Regional Options) in Windows Control Panel to install the relevant language. For more information, see Language handling by the SPSS Statistics SAV DSC.

When you set the decimal of a numeric type variable to zero (0) in IBM SPSS Statistics, the SPSS Statistics SAV DSC treats the variable as Long type (for example, when you export a SAV file to a DDF file, the variable type is treated as Long type in UNICOM Intelligence). This might result in a loss of precision. When you export the DDF file back to the SAV file format, the resulting SAV file will also exhibit a loss in precision. You can remedy the problem by setting the correct decimal value in IBM SPSS Statistics when a variable’s value is double.

When a variable in IBM SPSS Statistics is read as a Long value, Null is returned for numeric values that are less than -2,147,483,648 or greater than 2,147,483,647. To return the Double type or Text type instead, see Handling null values.