Example 1: Creating a categorical variable from a numeric variable
When you analyze numeric data, you will sometimes want to group the numeric values into categories. For example, you may want to analyze age data in age groups even when respondents are asked to enter their age as a numeric value. It is easy to create a categorical variable from a numeric variable in a DMS file. This process is sometimes called banding and each category created from the numeric variable is called a band.
This topic shows three ways of doing this:
The examples in this topic are in the
NewVariables.dms sample DMS file in the UNICOM Intelligence Developer Documentation Library. For more information, see
Sample DMS files.
Note When using UNICOM Intelligence Reporter - Survey Tabulation and UNICOM Intelligence Reporter to analyze your data, you can tabulate your numeric variables without creating derived categorical variables based on them. To do this, you simply define an axis expression to be used. If you want to save an axis expression in the metadata, you can define it in the Metadata section of your DMS files. For more information, see
Creating axis expressions and exporting to IBM SPSS Statistics.
Using an expression in the metadata
Variables that are defined in the metadata using an expression are known as dynamically derived variables and do not actually exist in the case data. After they have been defined in the metadata, they can be queried through the Case UNICOM Intelligence Data Model just like any other variable, but the return values are calculated “on the fly”. Generally the term “derived variable” is used to mean a dynamically derived variable.
However, with one exception, data management scripts (.dms) always convert dynamically derived variables that are defined in the Metadata section of a DMS file into standard variables that exist in both the case data and the metadata. Because these variables are based on other variables, they are sometimes called persisted derived variables. Data management scripts do this conversion because dynamically derived variables are not understood by other products, such as SPSS, Quantum, and Excel. Because data management scripts automatically persist the dynamically derived variable, the output data source contains case data for these variables (so that you can use them in any suitable product) and the expressions are removed from the variables in the output metadata.
The exception is that when you are using the UseInputAsOutput option, data management scripts do not convert the dynamically derived variables to standard variables. This has the advantage that the dynamically derived variables will automatically include any additional case data records that are added to the data source. This is not a disadvantage because you cannot use the UseInputAsOutput option to set up metadata in SPSS and Quantum data.
Here is the
Metadata section in a DMS file that creates a dynamically derived categorical variable from the
visits numeric variable in the Museum sample data set. Each category has the
expression keyword and an expression that specifies which values of the
visits variable correspond to the category.
Metadata(ENU, Question, Label, "myInputDataSource")
TotalVisits1 "Number of previous visits derived variable" categorical [1..1]
{Band1 "None" expression("visits = 0"),
Band2 "1-5 visits" expression("visits > 0 and visits < 6"),
Band3 "6-10 visits" expression("visits > 5 and visits <= 10"),
Band4 "11-20 visits" expression("visits > 10 and visits <= 20"),
Band5 "21-40 visits" expression("visits > 20 and visits <= 40"),
Band6 "41-70 visits" expression("visits > 40 and visits <= 70"),
Band7 "More than 70" expression("visits > 70")
};
End Metadata
Setting up the case data in the OnNextCase Event section
Another way of setting up a derived variable is to set up the metadata in the
Metadata section and the case data in the
OnNextCase Event section. However, setting up the case data in the OnNextCase Event section is generally slower than using expressions in the Metadata section, because the OnNextCase Event section code is executed separately for each individual case data record.
Here is the
Metadata section. This time the categories do not have the
expression keyword because we will set up the case data in the
OnNextCase Event section.
Metadata(ENU, Question, Label, myInputDataSource)
TotalVisits2 "Number of previous visits persisted derived
variable"
categorical [1..1]
{Band1 "None",
Band2 "1-5 visits",
Band3 "6-10 visits",
Band4 "11-20 visits",
Band5 "21-40 visits",
Band6 "41-70 visits",
Band7 "More than 70"
};
End Metadata
Here is the mrScriptBasic code for the
OnNextCase Event section. This code uses a
Select Case statement to test the value of the response in the
visits variable. This avoids the use of multiple
If...Then...Else statements. The code assigns the categorical variable a value according to the value of the response in the
visits variable.
Event(OnNextCase, "Set up the case data")
Select Case visits
Case NULL
TotalVisits2 = NULL
Case 0
TotalVisits2 = {Band1}
Case 1 To 5
TotalVisits2 = {Band2}
Case 6 To 10
TotalVisits2 = {Band3}
Case 11 To 20
TotalVisits2 = {Band4}
Case 21 To 40
TotalVisits2 = {Band5}
Case 41 To 70
TotalVisits2 = {Band6}
Case > 70
TotalVisits2 = {Band7}
End Select
End Event
You can use multiple expressions or ranges in each Case clause. For example, the following tests for nonconsecutive values:
Case 1 To 3, > 8
Using the TOM CreateCategorizedVariable method
When a Table Document with a data source is available, it is possible to create derived variables using the TOM CreateCategorizedVariable method. For example:
Dim TotalVisits3
TotalVisits3 =
TableDoc.Coding.CreateCategorizedVariable("visits")
For more information, see this sample script:
SimpleCategorization_Museum.mrs
in:
[INSTALL_FOLDER]\IBM\SPSS\DataCollection\7\DDL\\Scripts\Tables
Requirements
UNICOM Intelligence Professional
See also