Statistics using an axis block
When a question or derived variable requires statistics, the easiest way to set them up in your interviewing script is to use an axis block. There is very little difference between the syntax for statistics based on raw data and statistics based on factors, and there is never any need to define additional elements that store intermediate values used in the statistical calculations.
Defining factors
If the question or derived variable is categorical, you will need to define factors that can be used in the calculations. To do this, append the following to the response definitions in the main part of the response list, not in the axis block.
factor(n)
where n is a positive or negative integer or real number. Responses with no factor are ignored when statistics are calculated.
Creating statistical elements in the axis block
To create the statistical elements, place the following statement in the axis block for the question:
[Name] ['Text'] Statistic([NumQ])
where:
▪Name is a unique name for the element. If you do not specify a name, the name of the statistic is used.
▪Text is the text you want to use for this element in analyses (that is, the row or column heading). The default is the statistic’s name.
▪Statistic identifies the statistic you want to create; usually one of mean, stddev, stderr, or sampvar.
▪NumQ is the name of the numeric question whose raw data is to be used in the calculations. This is not required when factors are used.
Here are several variations of an age question. The first one defines the statistics as part of the question that respondents will be asked.
Age "How old are you?" long [18..99]
axis(
"{
A24 '18 to 24' expression('Age <= 24'),
A34 '25 to 34' expression('Age >= 25 And Age <= 34'),
A44 '35 to 44' expression('Age >= 35 And Age <= 44'),
A54 '45 to 54' expression('Age >= 45 And Age <= 54'),
A64 '55 to 64' expression('Age >= 55 And Age <= 64'),
A65 '65 plus' expression('age >= 65'),
AgeMean 'Mean' mean(Age),
AgeStddev 'Std.dev' stddev(Age)
}"
);
This is a standard numeric question that has been extended to include categorical elements for analysis only. The question is displayed in the usual way during interviews, and respondents type their exact age into a numeric input box. When you tabulate the question, each person’s age is read from the case data and the appropriate element in the axis block is incremented by 1 for each person in that age group. Because the question is numeric, the mean and standard deviation are calculated using the raw age data, giving you the true average age of your respondents and the standard deviation from that age.
The next example shows a categorical derived variable base on Age. (See
Categorical bands for numeric, text, and date responses for further information about this type of derived variable.)
AgeGroup "Respondent's age" categorical [1..1]
{
AG24 "18 to 24" expression("Age <= 24"),
AG34 "25 to 34" expression("Age >= 25 And Age <= 34"),
AG44 "35 to 44" expression("Age >= 35 And Age <= 44"),
AG54 "45 to 54" expression("Age >= 45 And Age <= 54"),
AG64 "55 to 64" expression("Age >= 55 And Age <= 64"),
AG65 "65 plus" expression("Age >= 65")
}
axis(
"{
..,
AGMean 'Mean' mean(Age),
AGStddev 'Std.dev' stddev(Age)
}"
);
The age ranges are defined in much the same way as they were in the previous example, using expressions to determine who is counted in each group. Points to notice about this example are:
▪The “..” at the start of the axis block picks up all the elements defined in the main part of the question.
▪Because the data for the variable come from a different question, the statistical elements name Age as being the question whose data is to be used in the calculations.
Finally, here’s an example that uses factors rather than raw data.
EldestChild "How old is your eldest school-age child?" categorical [1..1]
{
EC3to4 "3 to 4 years" factor(3.5),
EC5to7 "5 to 7 years" factor(6),
EC8to11 "8 to 11 years" factor(9.5),
EC12to16 "12 to 16 years" factor(14),
EC17to18 "17 to 18 years" factor(17.5)
}
axis(
"{
..,
ECMean 'Mean' mean(),
ECStddev 'Std. dev' stddev()
}"
);
The factors are defined as part of the responses so that they will be saved in the metadata. When the question is displayed, the respondent may choose one answer from the list. When the question is tabulated, the age categories are incremented by one for each respondent, and the statistics are calculated using the factors defined for each category. Anyone whose eldest child is aged three or four will count as 3.5 in the mean calculation, whereas anyone whose eldest child is between five and seven years old will count as 6 in the calculation. This generates an approximation of the true mean age based on the factors rather than on children’s exact ages.
See also