Categorical
The categorical type defines a categorical question, which is a question that has a limited number of categories that represent the possible responses. Categorical questions can be single response or multiple response.
The maximum number of categories in a categorical question is in theory approximately 4 billion. However, in reality the number of categories is limited by the available memory and the Data Source Component (DSC) being used.
Syntax
For clarity, each item is shown on a separate line, and optional items are indented, See also A question of type Categorical is
Syntax conventions.
field_name [ "field_label" ]
[ [ <properties> ] ]
[ <styles and templates> ]
categorical [ [ min_categories .. max_categories ] ]
[ <categories> ]
[ expression ("expression_text" [, ( deriveelements | noderiveelements ) ] ) ]
[ ( initialanswer | defaultanswer ) ({category_name(s)}) ]
[ <axis> ]
[ <usage-type> ]
[ <helperfields> ]
[ nocasedata ]
[ unversioned ]
Parameters
For
<axis>,
<usage-type>,
<helperfields>,
nocasedata, and
unversioned, see
Common parameters.
min_categories..max_categories
Defines the minimum and maximum number of responses that are valid. Setting max_categories to 1 defines the question as a single response question. Leaving max_categories undefined or setting it to a value greater than one, defines the question as multiple response. When min_categories is greater than zero, it means that the respondent must choose at least one category in response to the question.
In UNICOM Intelligence Data Model 2.8 and later, you can use [1] as a shortcut for [1..1], which indicates a single response question that must be answered.
expression
You use the optional expression keyword to create one or more categories and variables using a formula defined in expression_text. This feature is typically used to create categories based on the responses to other categorical questions.
You can also specify the following optional keywords with expression:
deriveelements
Derive categories from the expression. This is the default if no categories have been defined for the question.
noderiveelements
Do not derive categories from the expression. This is the default if one or more categories have been defined for the question.
For an example of using the expression keyword to create categories, see example 6 at the end of this topic.
initialanswer and defaultanswer
You can optionally specify either an initial answer or a default answer to the question. An initial answer can be seen by the respondent, whereas a default answer can not be seen. If either type of answer is defined, the respondent does not need to answer the question. If the question is a multiple response categorical, you can specify that the answer consists of more than one response. For example:
FavoritePets "What are your favorite household pets?" categorical
{ Dogs,
Goldfish,
Mice,
Cats,
Hamsters
}
defaultanswer ( {Dogs, Cats} );
If a default answer is not accepted when you run the interview script, in other words, the message “Missing Answer(s)” is displayed, see
mrScriptMetadata FAQs.
<categories>
Defines the question's category list, as follows:
<categories> ::= { <category> (, <category>)* }
[ rot[ate] | ran[domize] | rev[erse] |
asc[ending] | desc[ending] ]
[ fix ]
[ namespace ]
rotate or rot
The category list is to be rotated by one category before each presentation.
randomize or ran
The category list is to be presented in randomized order.
reverse or rev
The category list is to be reversed before each presentation. This means that the list is presented top-down to the first respondent, bottom-up to the next, and so on.
ascending or asc
The category list is to be sorted in ascending alphabetical order.
descending or desc
The category list is to be sorted in descending alphabetical order.
fix
When the category list is a sublist within a higher-level category list, forces the sublist to retain its original position when the higher-level category list is sorted, rotated, randomized, or reversed.
namespace
Indicates that the category names should be namespaced based on the list name. This means that the full names of the categories will include the name of the list. By default category names are not namespaced.
<category>
This format defines a category within the question's category list:
<category> ::= category_name [ "category_label" ]
[ <other> | <multiplier> | DK | REF | NA ]
[ exclusive ]
[ factor (factor_value) ]
[ keycode ("keycode_value") ]
[ expression ("exp_text") ]
[ elementtype (type_value) ]
[ fix ]
[ nofilter ]
This is an alternative format for directly specifying a group of categories within the question's category list:
<category> ::= [ list_name ]
[ "list_label" ]
<categories>
This is an alternative format for using a defined list (see
Define. (or “shared list”) to specify a group of categories within the question's category list:
<category> ::= [ list_name ]
use define_list
[ sublist
[ rot[ate] | ran[domize] | rev[erse] |
asc[ending] | desc[ending] ]
]
[ "list_label" ]
[ fix ]
category_name, category_label
For more information, see
Names and labels.
DK, REF, or NA
Defines the category as a Don't Know, Refused to Answer, or No Answer special response. When you use one of these keywords, the exclusive, fix, and nofilter keywords will be automatically set for the category. When using the DK, REF or NA keywords, you can use a hyphen (-) as the category name to specify that the category name should be the same as the keyword.
If you have a standard list of special response categories that you want to use for more than one question, you can set the categories up as a defined list. For more information, see
Define.
exclusive
Defines the category as exclusive, which means that it is a single-choice category. Typically, you use this to define a single-choice category in a multiple response question. For example, multiple response questions sometimes have a None of the above single-choice category.
factor
You can use the optional factor keyword to set a factor value for a category. Factors are numeric values that are used in statistical calculations. You can also set a factor value for the special response categories, DK, REF and NA.
keycode
You can use the optional keycode keyword to specify a custom keycode for individual categories. Keycodes are used in the UNICOM Intelligence Interviewer - Offline for Windows application to help limit the number of keystrokes required when entering response values.
fix
Forces a category to retain its original position when the category list is sorted, rotated, randomized, or reversed.
nofilter
Prevents the category from being excluded when using a filter in UNICOM Intelligence Reporter or the UNICOM Intelligence Professional Tables option. Typically, you use this keyword for Other Specify categories.
list_name, list_label
Defines the name and label for a group of categories.
list_name is useful if you are using the namespace option.
list_label defines a subheading for the group of categories. For more information, see
Names and labels.
use
Specifies a defined list to use. When using a defined list, you can use a hyphen (-) as the list name to indicate that you want to use the defined list’s name.
<other>
Defines the category as an Other Specify category, as follows:
<other> ::= other [ ( (use "helper_field_name")
| (helper_field_name "label" variable_type) ) ]
This is a category that allows the respondent to enter an answer that is not on the category list. Using this keyword means that a helper field will be created in the metadata for the categorical to store the response. When using the other keyword, you can use a hyphen (-) as the category name to specify that the category name should be the same as the keyword.
By default, the helper field created will be a text variable. For example, the following single response question includes an Other Specify category called other_museum that will store the response in a text helper field called other_museum:
Museum "Favorite museum" categorical [1]
{
Design,
History,
Science,
other_museum "Other museum" other
};
However, you can specify that another data type, such as a numeric, should be used instead. For example, the following single response question includes an Other Specify category called other (indicated by a hyphen) that will store the response in a long helper field called exact_age:
YourAge "Your age" categorical [1]
{
under_19 "18 or under",
over_64 "65 or over",
- "Other:" other( exact_age "Exact age" long [19..64] )
};
<multiplier>
Defines a special type of helper variable that is used to store numeric data that is associated with the category, as follows:
<multiplier> ::= multiplier ( (use "helper_field_name")
| (helper_field_name "label" variable_type) )
By default, UNICOM Intelligence Reporter and the UNICOM Intelligence Professional Tables option increment the cell count for a category by one for each respondent. However, if a multiplier variable has been defined, the value of that variable is used as the increment.
expression
You use the optional expression keyword to create a category or variable using a formula defined in exp_text. Generally you use this feature to create a derived variable that is based on one or more other variables.
exp_text must be an expression that is supported by the UNICOM Intelligence Data Model and can optionally include functions from the
UNICOM Intelligence Function Library. Double quotation marks " must be escaped using a second double quotation mark (for example,
""yyyy""). Check your syntax and spelling carefully, because the expression is stored without being validated. For more information, see
Expression evaluation.
elementtype
You use the optional
elementtype keyword to create a special element that can be used by UNICOM Intelligence Reporter to assist in the analysis of data. In general, it is easier to use
Axis expressions to add special elements, as the axis specification syntax enables you to include special elements without having to define them. However, you might prefer to define the special elements so that they are permanently attached to a variable. This can be useful if you know in advance the types of special elements that will be required to analyze the variable.
Analysis elements are not used during data collection, so you typically specify the elementtype keyword for a category in a derived variable, that is, a variable whose categories include the expression keyword.
type_value defines the type of special element and can be any of these values:
AnalysisBase
A base element. When used in a crosstabulation, the base element typically shows the total number of cases in the variable.
AnalysisCategory
An element that is used only for analysis. You can use this value for categories that you do not want to appear in interviews.
AnalysisMaximum
An analysis element that is used to show the maximum value of a numeric variable.
AnalysisMean
An analysis element that is used to show the mean value of a numeric variable.
AnalysisMinimum
An analysis element that is used to show the minimum value of a numeric variable.
AnalysisSampleVariance
An analysis element that is used to show the sample variance of a numeric variable.
AnalysisStdDev
An analysis element that is used to show the standard deviation of a numeric variable.
AnalysisStdErr
An analysis element that is used to show the standard error of a numeric variable.
AnalysisSubHeading
A subheading element. Typically, this is used to provide a label for a group of elements. For example, in a list of car models, subheading elements might be used to display manufacturer's names.
AnalysisSubTotal
A subtotal element. This typically shows the total of the cell values since the previous subtotal or base element.
AnalysisSummaryData
An element used to store summary data for use in analysis.
AnalysisTotal
A total element. This typically shows the total of the cell values since the previous total or base element.
When specifying the text value, you do not have to enclose the text in double quotation marks. For example, the following are equivalent:
Total elementtype(AnalysisTotal)
Total elementtype("AnalysisTotal")
There are two ways of defining how the special element will be calculated. The first method is to add the expression keyword to the element and define the formula in the expression text. This method is useful when you want to define a non-standard formula, for example if you want to exclude some cases from the base element. The second method does not require an expression, but does require you to set
Custom properties on the special element. By setting the correct custom properties, the calculation will be done automatically. For example, for a base element, setting the
HasNoData and
ExcludedFromSummaries custom properties to True, as shown in the following example, will result in the base figure being calculated automatically:
Base [HasNoData = True, ExcludedFromSummaries = True] elementtype(AnalysisBase)
In addition, special elements of types AnalysisMean, AnalysisSampleVariance, AnalysisStdDev, and AnalysisStdErr, require the variable to include additional special elements of type AnalysisSummaryData. These additional elements are sometimes called “helper elements”, and custom properties must also be set on them. Helper elements assist with the calculation, but do not appear in the tabulation. The following example shows one type of helper element, which is called SumN and has four custom properties:
SumN [CalculationType="SumN", Hidden=True, ExcludedFromSummaries=True, DataElement=""] elementType(AnalysisSummaryData)
For more information about the custom properties that must be set on a special element, and the helper elements that are required, see
Special elements and helper elements in metadata. Example 8 at the end of this topic is a complete example of derived categorical variable that includes a mean element and helper elements.
When using the elementtype keyword, you can use - (hyphen) to indicate that the category name should be the same as the type_value text. However, do not use this if you are using the same type_value more than once in a categorical question (for example, to create two or more subheadings), because it leads to duplicate names, which causes an error.
sublist
By default, the categories in defined lists are included in category-ordering operations such as randomize and ascending that have been specified for the question. For example, the categories in the animals question are listed in ascending order, that is, cat, cow, dog, giraffe, hamster, horse, lion, tiger.
wild_animals define {tiger, lion, giraffe};
pets define {dog, cat, hamster};
animals categorical {
use wild_animals,
use pets,
cow,
horse
} ascending;
If you don't want the categories in a defined list to be affected by category-ordering operations, specify the sublist keyword, like this:
animals categorical {
use wild_animals sublist, ' <== Sublist keyword added
use pets sublist, ' <== Sublist keyword added
cow,
horse
} ascending;
The categories will now be listed in the following order: cow, horse, “pets” sub heading, dog, cat, hamster, “wild_anmals” sub heading, tiger, lion, giraffe. Notice that the categories in each defined list are still in their original order, but that the list names, pets and wild_animals, have been sorted along with cow and horse. In addition, the list names are now displayed as sub headings.
If you specify the sublist keyword, you can also order the categories in the defined list using the same category-ordering keywords that can be used for the question. For example,
animals categorical {
use wild_animals sublist,
use pets sublist descending, ' <== Descending keyword added
cow,
horse
} ascending;
The categories are now listed in this order: cow, horse, “pets” sub heading, hamster, dog, cat, “wild_anmals” sub heading, tiger, lion, giraffe. Note that the categories in the pets list are now sorted in descending order.
You do not need to specify the sublist keyword if you directly specify a group of categories within the question's category list (that is, you specify a group of categories without using a defined list) as that type of group is a sublist by default.
Remarks
For more information about writing Categorical questions in interview scripts, see
Questions with categorical responses.
Examples
Single response question
The following example defines a single response question that includes both category names and labels.
Age "To which age group do you belong?" categorical [1]
{
Y11_16 "11-16 Years",
Y17_20 "17-20 Years",
Y21_24 "21-24 Years",
Y25_34 "25-34 Years",
Y35_44 "35-44 Years",
Y45_54 "45-54 Years",
Y55_64 "55-64 Years",
Y65_pp "65+ Years"
};
Multiple response question
The following example defines a multiple response question that requires at least one answer and at most three. The category list defines category labels only for the categories for which the labels are not suitable as names because they contain space characters. The category names will automatically be used as labels for the other categories. The None of these category is marked as an exclusive answer. This means that when a respondent selects this category, he or she will not be allowed to select another one as well.
Drinks "Name up to 3 of your favorite drinks" categorical [1 .. 3]
{
Milk,
Coffee,
Tea,
Orange "Orange juice",
None "None of these" exclusive
};
You can omit the categorical keyword if you do not specify a range. For example, the following creates a multiple response question for which there is no limit on the number of categories the respondent can choose:
AllDrinks "Which of the following do you drink at least once a week?"
{
Milk,
Coffee,
Tea,
Orange "Orange juice",
None "None of these" exclusive
};
However, including the categorical keyword generally makes your script easier to read and understand.
Other specify category
This example defines a single response question that has an other specify category. This is defined using the other keyword. Notice that a hyphen (-) has been used instead of a name. This indicates that the category should be given the default name of other.
DrinksWithOther "Which is your favorite drink?" categorical [1 .. 1]
{
Milk,
Coffee,
Tea,
Orange "Orange juice",
- "Other, please specify" other
};
Multiple response question containing two groups of categories
This example defines a multiple response question that requires at least one answer. The category list is broken into two separate groups, each of which has an Other Specify category. Both groups are randomized, and the two groups are rotated. The special DK (Don't Know) response is automatically fixed at the end of the entire list and the two Other Specify categories are fixed within the groups.
Service "What are your impressions of the service/billing?" categorical [1 ..]
{
Positive "Positive responses"
{
NoProb "No Problems - no cause to complain",
GoodService "Good Service",
GoodBreakdown "Provides breakdown of expenses",
CardsAccept "Wide acceptance of cards",
PosOther "Other positive" other fix
} ran,
Negative "Negative responses"
{
Slow "Slow to respond/poor service",
LimitedCards "Limited acceptance of cards",
Inaccurate "Inaccuracies of bills",
CostTooHigh "Costs too high/expensive",
CustContact "Not enough customer contact/service",
NegOther "Other negative" other fix
} ran,
DontKnow "Don't know" DK
} rot;
In the preceding Service question, the categories within the two groups are not namespaced. For example, the full names of the categories in the Positive list would be:
{
NoProb,
GoodService,
GoodBreakdown,
CardsAccept,
PosOther
}
However, if the namespace option is selected by using the namespace keyword on each list as follows:
Service "What are your impressions of the service/billing?" categorical [1 ..]
{
Positive "Positive responses"
{
NoProb "No Problems - no cause to complain",
GoodService "Good Service",
GoodBreakdown "Provides breakdown of expenses",
CardsAccept "Wide acceptance of cards",
PosOther "Other positive" other fix
} ran namespace, ' <== Namespace keyword added
Negative "Negative responses"
{
Slow "Slow to respond/poor service",
LimitedCards "Limited acceptance of cards",
Inaccurate "Inaccuracies of bills",
CostTooHigh "Costs too high/expensive",
CustContact "Not enough customer contact/service",
NegOther "Other negative" other fix
} ran namespace, ' <== Namespace keyword added
DontKnow "Don't know" DK
} rot;
the category full names then become:
{
Positive.NoProb,
Positive.GoodService,
Positive.GoodBreakdown,
Positive.CardsAccept,
Positive.PosOther
}
The namespace option is particularly useful when you combine category lists that contain categories that have the same name, because it makes the full names of the categories unique. Note that you will get an error when mrScript MDSC reads the code if there are duplicate category full names in a question.
Derived variable created using category expressions
This example defines a derived categorical variable. This example defines each category using an expression. The expressions use the Age and Gender variables to create categories for four age groups for each gender.
GenderAge "Gender and age derived variable"
{
Boys
expression("Gender = {Male} And Age * {Y11_16, Y17_20}"),
YoungMen "Young men"
expression("Gender = {Male} And Age * {Y21_24, Y25_34}"),
MiddleMen "Middle-aged men"
expression("Gender = {Male} And Age * {Y35_44, Y45_54, Y55_64}"),
OldMen "Older men"
expression("Gender = {Male} And Age * {Y65_pp}"),
Girls
expression("Gender = {Female} And Age * {Y11_16, Y17_20}"),
YoungWomen "Young women"
expression("Gender = {Female} And Age * {Y21_24, Y25_34}"),
MiddleWomen "Middle-aged women"
expression("Gender = {Female} And Age * {Y35_44, Y45_54, Y55_64}"),
OldWomen "Older women"
expression("Gender = {Female} And Age * {Y65_pp}")
};
Derived variable created using one expression
The TotalAwareness variable in this example is a derived categorical variable using one expression on the variable rather than separate expressions on each of the categories. The expression uses the union (+) operator to define the derived variable as the union of the responses to two existing categorical questions, Prompted and Spontaneous, which ask which brands are known with and without prompting. This means that the derived variable will contain the responses to both questions. In Data Model 2.8 and later, categories are generated automatically when you define an expression in this way for a categorical variable, but do not specify any categories, as in this example.
The category lists for the Prompted and Spontaneous questions are defined using the same defined list. However, this is not a requirement. The categories could be defined on the variables themselves (rather than in a defined list) and the category lists do not have to be identical. For example, you can also use different or overlapping lists.
BrandList define
{
Red, Blue, Yellow, Green, Purple, Black,
White, Cerise, Turquoise, Orange, Pink
};
Spontaneous "Brands remembered spontaneously" categorical
{use BrandList};
Prompted "Brands remembered after prompting" categorical
{use BrandList};
TotalAwareness "Total Brand Awareness" categorical
expression ("Spontaneous + Prompted");
For more information on defined lists, see
Define.
Derived categorical variable with base and subheading elements
This example creates a derived categorical variable that has a base element and two subheading elements, the first of which has the text Teas and the second Coffees.
The expression on the variable is evaluated first and then any category expressions are evaluated.
HotDrinks "Favorite hot drink" categorical [1]
{
Base elementtype(AnalysisBase)
expression("Drinks * {Green, Chinese, Black, Herbal,
Flavored, Ground, Instant, Ready}"),
Teas elementtype(AnalysisSubheading),
Green "Green tea",
Chinese "Chinese tea",
Black "Black tea",
Herbal "Herbal tea",
Flavored "Flavored tea",
Coffees elementtype(AnalysisSubheading),
Ground "Ground coffee",
Instant "Instant coffee",
Ready "Ready to drink coffee"
} expression("Drinks");
Derived categorical variable with mean and helper elements
This example shows a derived categorical variable that includes special elements that will be used in UNICOM Intelligence Reporter or UNICOM Intelligence Professional Tables Option. The categorical variable is based on the income numeric variable. The categorical variable includes three bands based on the values in income, and it also includes a mean special element and the relevant helper elements (SumX and SumN) required to calculate the mean:
income "What is your income?" long [0..];
derived_income "Derived variable showing bands and mean" categorical [1] {
income_1 "Lower income"
expression("income > 0 and income <= 30000"),
income_2 "Middle income"
expression("income > 30000 and income <= 60000"),
income_3 "High income"
expression("income > 60000"),
SumX [CalculationType="SumX",
Hidden=True,
ExcludedFromSummaries=True,
DataElement=""]
elementType(AnalysisSummaryData)
multiplier(use income),
SumN [CalculationType="SumN",
Hidden=True,
ExcludedFromSummaries=True,
DataElement=""]
elementType(AnalysisSummaryData),
Mean "Average income"
[CalculationType="Mean",
HasNoData=True,
ExcludedFromSummaries=True]
elementtype(AnalysisMean)
};
Helper elements (
SumX and
SumN in this example) must always be contiguous and must always appear before the special element that requires them (
Mean in this example).For more information about the rules that apply when adding helper elements to mrScriptMetaData, see
Special elements and helper elements in metadata.
Single response question with custom keycodes
This example defines a single response question that includes custom keycodes. You can add custom keycodes to your script for use in the UNICOM Intelligence Interviewer - Offline for Windows application. Keycodes are associated with response values and are used in the UNICOM Intelligence Interviewer - Offline for Windows application to help limit the number of keystrokes required when entering response values.
FavoriteCoffee "Which one of these types of coffee is your favorite?"
categorical [1..1]
{
Colombian "Colombian" keycode("1"),
Costa_Rican "Costa Rican" keycode("2"),
Java "Java" keycode("3"),
Italian_Blend "Italian Blend" keycode("4"),
Brazilian "Brazilian" keycode("5")
};
See