Categorical

The categorical type defines a categorical question, which is a question that has a limited number of categories that represent the possible responses. Categorical questions can be single response or multiple response.

The maximum number of categories in a categorical question is in theory approximately 4 billion. However, in reality the number of categories is limited by the available memory and the Data Source Component (DSC) being used.

For clarity, each item is shown on a separate line, and optional items are indented, See also A question of type Categorical is Syntax conventions.

field_name [ "field_label" ]
    [ [ <properties> ] ]
    [ <styles and templates> ]
categorical [ [ min_categories .. max_categories ] ]
    [ <categories> ]
    [ expression ("expression_text" [, ( deriveelements | noderiveelements ) ] ) ]
    [ ( initialanswer | defaultanswer ) ({category_name(s)}) ]
    [ <axis> ]
    [ <usage-type> ]
    [ <helperfields> ]
    [ nocasedata ]
    [ unversioned ]

Defines the minimum and maximum number of responses that are valid. Setting max_categories to 1 defines the question as a single response question. Leaving max_categories undefined or setting it to a value greater than one, defines the question as multiple response. When min_categories is greater than zero, it means that the respondent must choose at least one category in response to the question.

In UNICOM Intelligence Data Model 2.8 and later, you can use [1] as a shortcut for [1..1], which indicates a single response question that must be answered.

You use the optional expression keyword to create one or more categories and variables using a formula defined in expression_text. This feature is typically used to create categories based on the responses to other categorical questions.

You can also specify the following optional keywords with expression:

For an example of using the expression keyword to create categories, see example 6 at the end of this topic.

You can optionally specify either an initial answer or a default answer to the question. An initial answer can be seen by the respondent, whereas a default answer can not be seen. If either type of answer is defined, the respondent does not need to answer the question. If the question is a multiple response categorical, you can specify that the answer consists of more than one response. For example:

FavoritePets "What are your favorite household pets?" categorical
  { Dogs,
    Goldfish,
    Mice,
    Cats,
    Hamsters
  }
  defaultanswer ( {Dogs, Cats} );

If a default answer is not accepted when you run the interview script, in other words, the message “Missing Answer(s)” is displayed, see mrScriptMetadata FAQs.

Defines the question's category list, as follows:

<categories> ::= { <category> (, <category>)* }
                 [ rot[ate] | ran[domize] | rev[erse] |
                     asc[ending] | desc[ending] ]
                 [ fix ]
                 [ namespace ]

The category list is to be reversed before each presentation. This means that the list is presented top-down to the first respondent, bottom-up to the next, and so on.

When the category list is a sublist within a higher-level category list, forces the sublist to retain its original position when the higher-level category list is sorted, rotated, randomized, or reversed.

Indicates that the category names should be namespaced based on the list name. This means that the full names of the categories will include the name of the list. By default category names are not namespaced.

This format defines a category within the question's category list:

<category> ::= category_name [ "category_label" ]
               [ <other> | <multiplier> | DK | REF | NA ]
               [ exclusive ]
               [ factor (factor_value) ]
               [ keycode ("keycode_value") ]
               [ expression ("exp_text") ]
               [ elementtype (type_value) ]
               [ fix ]
               [ nofilter ]

This is an alternative format for directly specifying a group of categories within the question's category list:

This is an alternative format for using a defined list (see Define. (or “shared list”) to specify a group of categories within the question's category list:

<category> ::= [ list_name ]
               use define_list
[ sublist
[ rot[ate] | ran[domize] | rev[erse] |
                asc[ending] | desc[ending] ]
]
               [ "list_label" ]
               [ fix ]

Defines the category as a Don't Know, Refused to Answer, or No Answer special response. When you use one of these keywords, the exclusive, fix, and nofilter keywords will be automatically set for the category. When using the DK, REF or NA keywords, you can use a hyphen (-) as the category name to specify that the category name should be the same as the keyword.

If you have a standard list of special response categories that you want to use for more than one question, you can set the categories up as a defined list. See Define for more information.

Defines the category as exclusive, which means that it is a single-choice category. Typically, you use this to define a single-choice category in a multiple response question. For example, multiple response questions sometimes have a None of the above single-choice category.

You can use the optional factor keyword to set a factor value for a category. Factors are numeric values that are used in statistical calculations. You can also set a factor value for the special response categories, DK, REF and NA.

You can use the optional keycode keyword to specify a custom keycode for individual categories. Keycodes are used in the UNICOM Intelligence Interviewer - Offline for Windows application to help limit the number of keystrokes required when entering response values.

Prevents the category from being excluded when using a filter in UNICOM Intelligence Reporter or the UNICOM Intelligence Professional Tables option. Typically, you use this keyword for Other Specify categories.

Defines the name and label for a group of categories. list_name is useful if you are using the namespace option. list_label defines a subheading for the group of categories.See Names and labels for more information.

Specifies a defined list to use. When using a defined list, you can use a hyphen (-) as the list name to indicate that you want to use the defined list's name.

Defines the category as an Other Specify category, as follows:

This is a category that allows the respondent to enter an answer that is not on the category list. Using this keyword means that a helper field will be created in the metadata for the categorical to store the response. When using the other keyword, you can use a hyphen (-) as the category name to specify that the category name should be the same as the keyword.

By default, the helper field created will be a text variable. For example, the following single response question includes an Other Specify category called other_museum that will store the response in a text helper field called other_museum:

However, you can specify that another data type, such as a numeric, should be used instead. For example, the following single response question includes an Other Specify category called other (indicated by a hyphen) that will store the response in a long helper field called exact_age:

Defines a special type of helper variable that is used to store numeric data that is associated with the category, as follows:

By default, UNICOM Intelligence Reporter and the UNICOM Intelligence Professional Tables option increment the cell count for a category by one for each respondent. However, if a multiplier variable has been defined, the value of that variable is used as the increment.

You use the optional expression keyword to create a category or variable using a formula defined in exp_text. Generally you use this feature to create a derived variable that is based on one or more other variables.

exp_text must be an expression that is supported by the UNICOM Intelligence Data Model and can optionally include functions from the UNICOM Intelligence Function Library. Double quotation marks (") must be escaped using a second double quotation mark (for example, ""yyyy""). Check your syntax and spelling carefully, because the expression is stored without being validated. See Expression evaluation for more information.

You use the optional elementtype keyword to create a special element that can be used by UNICOM Intelligence Reporter to assist in the analysis of data. In general, it is easier to use Axis expressions to add special elements, as the axis specification syntax enables you to include special elements without having to define them. However, you might prefer to define the special elements so that they are permanently attached to a variable. This can be useful if you know in advance the types of special elements that will be required to analyze the variable.

Analysis elements are not used during data collection, so you typically specify the elementtype keyword for a category in a derived variable, that is, a variable whose categories include the expression keyword.

type_value defines the type of special element and can be any of these values:

A subheading element. Typically, this is used to provide a label for a group of elements. For example, in a list of car models, subheading elements might be used to display manufacturer's names.

When specifying the text value, you do not have to enclose the text in double quotation marks. For example, the following are equivalent:

There are two ways of defining how the special element will be calculated. The first method is to add the expression keyword to the element and define the formula in the expression text. This method is useful when you want to define a non-standard formula, for example if you want to exclude some cases from the base element. The second method does not require an expression, but does require you to set Custom properties on the special element. By setting the correct custom properties, the calculation will be done automatically. For example, for a base element, setting the HasNoData and ExcludedFromSummaries custom properties to True, as shown in the following example, will result in the base figure being calculated automatically:

In addition, special elements of types AnalysisMean, AnalysisSampleVariance, AnalysisStdDev, and AnalysisStdErr, require the variable to include additional special elements of type AnalysisSummaryData. These additional elements are sometimes called “helper elements”, and custom properties must also be set on them. Helper elements assist with the calculation, but do not appear in the tabulation. The following example shows one type of helper element, which is called SumN and has four custom properties:

For more information about the custom properties that must be set on a special element, and the helper elements that are required, see Special elements and helper elements in metadata. Example 8 at the end of this topic is a complete example of derived categorical variable that includes a mean element and helper elements.

When using the elementtype keyword, you can use a hyphen (-) to indicate that the category name should be the same as the type_value text. However, this should not be used if you are using the same type_value more than once in a categorical question (for example, to create two or more subheadings), because this will lead to duplicate names, which will cause an error.

By default, the categories in defined lists are included in category-ordering operations such as randomize and ascending that have been specified for the question. For example, the categories in the question animals below will be listed in ascending order, that is, cat, cow, dog, giraffe, hamster, horse, lion, tiger.

wild_animals define {tiger, lion, giraffe};
pets define {dog, cat, hamster};

animals categorical {
use wild_animals,
use pets,
cow,
horse
} ascending;

If you don't want the categories in a defined list to be affected by category-ordering operations, specify the sublist keyword, as shown below.

The categories will now be listed in the following order: cow, horse, “pets” sub heading, dog, cat, hamster, “wild_anmals” sub heading, tiger, lion, giraffe. Notice that the categories in each defined list are still in their original order, but that the list names, pets and wild_animals, have been sorted along with cow and horse. In addition, the list names are now displayed as sub headings.

If you specify the sublist keyword, you can also order the categories in the defined list using the same category-ordering keywords that can be used for the question. For example,

The categories are now listed in this order: cow, horse, “pets” sub heading, hamster, dog, cat, “wild_anmals” sub heading, tiger, lion, giraffe. Note that the categories in the pets list are now sorted in descending order.

You do not need to specify the sublist keyword if you directly specify a group of categories within the question's category list (that is, you specify a group of categories without using a defined list) as that type of group is a sublist by default.

Age "To which age group do you belong?" categorical [1]
{
Y11_16 "11-16 Years",
Y17_20 "17-20 Years",
Y21_24 "21-24 Years",
Y25_34 "25-34 Years",
Y35_44 "35-44 Years",
Y45_54 "45-54 Years",
Y55_64 "55-64 Years",
Y65_pp "65+ Years"
};

The following example defines a multiple response question that requires at least one answer and at most three. The category list defines category labels only for the categories for which the labels are not suitable as names because they contain space characters. The category names will automatically be used as labels for the other categories. The None of these category is marked as an exclusive answer. This means that when a respondent selects this category, he or she will not be allowed to select another one as well.

You can omit the categorical keyword if you do not specify a range. For example, the following creates a multiple response question for which there is no limit on the number of categories the respondent can choose:

AllDrinks "Which of the following do you drink at least once a week?"
{
Milk,
Coffee,
Tea,
Orange "Orange juice",
None "None of these" exclusive
};

This example defines a single response question that has an other specify category. This is defined using the other keyword. Notice that a hyphen (-) has been used instead of a name. This indicates that the category should be given the default name of other.

DrinksWithOther "Which is your favorite drink?" categorical [1 .. 1]
{
Milk,
Coffee,
Tea,
Orange "Orange juice",
- "Other, please specify" other
};

This example defines a multiple response question that requires at least one answer. The category list is broken into two separate groups, each of which has an Other Specify category. Both groups are randomized, and the two groups are rotated. The special DK (Don't Know) response is automatically fixed at the end of the entire list and the two Other Specify categories are fixed within the groups.

Service "What are your impressions of the service/billing?" categorical [1 ..]
{
Positive "Positive responses"
{
NoProb "No Problems - no cause to complain",
GoodService "Good Service",
GoodBreakdown "Provides breakdown of expenses",
CardsAccept "Wide acceptance of cards",
PosOther "Other positive" other fix
} ran,
Negative "Negative responses"
{
Slow "Slow to respond/poor service",
LimitedCards "Limited acceptance of cards",
Inaccurate "Inaccuracies of bills",
CostTooHigh "Costs too high/expensive",
CustContact "Not enough customer contact/service",
NegOther "Other negative" other fix
} ran,
DontKnow "Don't know" DK
} rot;

In the Service question above, the categories within the two groups are not namespaced. For example, the full names of the categories in the Positive list would be:

Service "What are your impressions of the service/billing?" categorical [1 ..]
{
Positive "Positive responses"
{
NoProb "No Problems - no cause to complain",
GoodService "Good Service",
GoodBreakdown "Provides breakdown of expenses",
CardsAccept "Wide acceptance of cards",
PosOther "Other positive" other fix
} ran namespace, ' <== Namespace keyword added
Negative "Negative responses"
{
Slow "Slow to respond/poor service",
LimitedCards "Limited acceptance of cards",
Inaccurate "Inaccuracies of bills",
CostTooHigh "Costs too high/expensive",
CustContact "Not enough customer contact/service",
NegOther "Other negative" other fix
} ran namespace, ' <== Namespace keyword added
DontKnow "Don't know" DK
} rot;

The namespace option is particularly useful when you combine category lists that contain categories that have the same name, because it makes the full names of the categories unique. Note that you will get an error when mrScript MDSC reads the code if there are duplicate category full names in a question.

This example defines a derived categorical variable. This example defines each category using an expression. The expressions use the Age and Gender variables to create categories for four age groups for each gender.

GenderAge "Gender and age derived variable"
{
Boys
expression("Gender = {Male} And Age * {Y11_16, Y17_20}"),
YoungMen "Young men"
expression("Gender = {Male} And Age * {Y21_24, Y25_34}"),
MiddleMen "Middle-aged men"
expression("Gender = {Male} And Age * {Y35_44, Y45_54, Y55_64}"),
OldMen "Older men"
expression("Gender = {Male} And Age * {Y65_pp}"),
Girls
expression("Gender = {Female} And Age * {Y11_16, Y17_20}"),
YoungWomen "Young women"
expression("Gender = {Female} And Age * {Y21_24, Y25_34}"),
MiddleWomen "Middle-aged women"
expression("Gender = {Female} And Age * {Y35_44, Y45_54, Y55_64}"),
OldWomen "Older women"
expression("Gender = {Female} And Age * {Y65_pp}")
};

The TotalAwareness variable in this example is a derived categorical variable using one expression on the variable rather than separate expressions on each of the categories. The expression uses the union (+) operator to define the derived variable as the union of the responses to two existing categorical questions, Prompted and Spontaneous, which ask which brands are known with and without prompting. This means that the derived variable will contain the responses to both questions. In Data Model 2.8 and later, categories are generated automatically when you define an expression in this way for a categorical variable, but do not specify any categories, as in this example.

The category lists for the Prompted and Spontaneous questions are defined using the same defined list. However, this is not a requirement. The categories could be defined on the variables themselves (rather than in a defined list) and the category lists do not have to be identical. For example, you can also use different or overlapping lists.

BrandList define
  {
    Red, Blue, Yellow, Green, Purple, Black,
    White, Cerise, Turquoise, Orange, Pink
  };

Spontaneous "Brands remembered spontaneously" categorical
  {use BrandList};

Prompted "Brands remembered after prompting" categorical
  {use BrandList};

TotalAwareness "Total Brand Awareness" categorical
  expression ("Spontaneous + Prompted");

This example creates a derived categorical variable that has a base element and two subheading elements, the first of which has the text Teas and the second Coffees.

HotDrinks "Favorite hot drink" categorical [1]
{
Base elementtype(AnalysisBase)
expression("Drinks * {Green, Chinese, Black, Herbal,
Flavored, Ground, Instant, Ready}"),
Teas elementtype(AnalysisSubheading),
Green "Green tea",
Chinese "Chinese tea",
Black "Black tea",
Herbal "Herbal tea",
Flavored "Flavored tea",
Coffees elementtype(AnalysisSubheading),
Ground "Ground coffee",
Instant "Instant coffee",
Ready "Ready to drink coffee"
} expression("Drinks");

This example shows a derived categorical variable that includes special elements that will be used in UNICOM Intelligence Reporter or UNICOM Intelligence Professional Tables Option. The categorical variable is based on the income numeric variable. The categorical variable includes three bands based on the values in income, and it also includes a mean special element and the relevant helper elements (SumX and SumN) required to calculate the mean:

income "What is your income?" long [0..];

derived_income "Derived variable showing bands and mean" categorical [1] {
income_1 "Lower income"
expression("income > 0 and income <= 30000"),
income_2 "Middle income"
expression("income > 30000 and income <= 60000"),
income_3 "High income"
expression("income > 60000"),
SumX [CalculationType="SumX",
Hidden=True,
ExcludedFromSummaries=True,
DataElement=""]
elementType(AnalysisSummaryData)
multiplier(use income),
SumN [CalculationType="SumN",
Hidden=True,
ExcludedFromSummaries=True,
DataElement=""]
elementType(AnalysisSummaryData),
Mean "Average income"
[CalculationType="Mean",
HasNoData=True,
ExcludedFromSummaries=True]
elementtype(AnalysisMean)
};

Helper elements (SumX and SumN in this example) must always be contiguous and must always appear before the special element that requires them (Mean in this example).For more information about the rules that apply when adding helper elements to mrScriptMetaData, see Special elements and helper elements in metadata.

This example defines a single response question that includes custom keycodes. You can add custom keycodes to your script for use in the UNICOM Intelligence Interviewer - Offline for Windows application. Keycodes are associated with response values and are used in the UNICOM Intelligence Interviewer - Offline for Windows application to help limit the number of keystrokes required when entering response values.

FavoriteCoffee "Which one of these types of coffee is your favorite?"
categorical [1..1]
{
Colombian "Colombian" keycode("1"),
Costa_Rican "Costa Rican" keycode("2"),
Java "Java" keycode("3"),
Italian_Blend "Italian Blend" keycode("4"),
Brazilian "Brazilian" keycode("5")
};