Professional > Interview scripting > Writing interview scripts > Keywords for data analysis > Additional variables for analysis only
 
Additional variables for analysis only
Sometimes you know when you are writing the interviewing script that you will want to present the results in ways that cannot be achieved simply by using the interview questions to create tables. For example, you may be adding questions for age and gender to the interview script, but you already know that you will need tables that show gender and age combined. There is no point and, indeed, no need to create a separate interviewing question that has these combined categories as its responses, since you already have all the information you need in the data file. Instead, you can set up a question whose responses come from the questions that you already have. This is called a derived variable.
You do not have to ask the question in interviews. Since it is defined in the metadata section, the question is stored in the questionnaire (.mdd) file, but it never has any data of its own because it is not asked. Instead, its data is generated and made temporarily available whenever you use the question in analyses.
To create a derived variable for analyses, define it as a question in the usual way. Set its type according to the type of data it will contain. Then, add expressions defining the characteristics that respondents must have in order to be included in the variable. For categorical questions, you can append the expressions to the response definitions or you can create a separate axis block containing the necessary statements. For other types of questions you can either append the expression to the question definition or you can create an axis section.
Examples
Here is a specification for a derived age and gender variable:
GenderAge "Age and Gender" categorical [1..1]
{
  YoungMan "Men under 30" expression("Gender={Male} And Age < 30"),
  MiddleMan "Men 30 to 50" expression("Gender={Male} And Age >= 30 And Age <= 50"),
  SeniorMan "Men over 50" expression("Gender={Male} And Age > 50"),
  YoungWoman "Women under 30" expression("Gender={Female} And Age < 30"),
  MiddleWoman "Women 30 to 50" expression("Gender={Female} And Age >=30 And Age <= 50"),
  SeniorWoman "Women over 50" expression("Gender={Female} And Age > 50")
};
Here’s a numeric variable that combines the incomes of the two main wage earners in the household:
Hhincome "Total household income" long expression("Earner1 + Earner2");
If you want to look at income in more detail, you can replace the expression with an axis block that defines different income bands. The expressions would then test that the sum of the two incomes was in the given range. For an example of using an axis block in a numeric question, see Categorical bands for numeric, text, and date responses.
See also
Keywords for data analysis