Creating statistics using analysis elements and raw data
When you have a derived variable who responses are based on numeric data you can use the raw data to produce accurate statistics. You specify these elements in the same way that you do for statistics based on factors, but because the data used in the calculation of the statistics is coming from a different question or variable you must also define some hidden elements that contain intermediate data that is required by your statistics.
For example, means are calculated by taking the sum of the numeric values (known as the sum-of-x) and dividing it by the number of cases or respondents (known as the sum-of-n), while standard deviations require the same information as well as the sum of the squared numeric values (sum-of-x-squared). These are all additional elements that you must define as part of the question.
Syntax
Put one of the following statements in the response list for each additional element required. In the case of a mean you would type two statements, and for a standard deviation you would type three. The statement has been written over three lines and shows the points at which line breaks are allowed.
Name
[CalculationType=Type, Hidden=True, ExcludedFromSummaries=True, DataElement=""]elementType(AnalysisSummaryData)
[multiplier(use QName)]
Parameters
Name
The element’s name. This must be one of SumN, SumX, or SumXSquared.
Type
The type of calculation to be stored in this element. Refer to the table for details.
QName
The name of the numeric variable whose raw data is to be used in the calculation. The multiplier parameter is not required for means.
Addiional elements
The following table shows which calculation types are required for each of the common statistics:
Statistic
|
Additional elements required, and in what order
|
Mean
|
SumX, SumN
|
Standard deviation
|
SumXSquared, SumX, SumN
|
Standard error
|
SumXSquared, SumX, SumN
|
Sample variance
|
SumXSquared, SumX, SumN, SumUnweightedN
|
Rules for defining the additional elements are as follows:
▪You must define these elements in the order they are listed in the table.
▪Do not insert other elements between SumXSquared, SumX, and SumN.
▪SumUnweightedN must come immediately before the Standard error or Sample variance element it belongs to.
▪If a question has multiple statistics and some of the additional elements are common to several of those elements, you may define the additional elements once, subject to the rules, and they will be applied to all statistics.
Example
Here’s a derived variable that generates age ranges based on the exact ages stored in the Age question. See
Categorical bands for numeric, text, and date responses for further information about these types of derived variables.
HowOld "Respondent's age" categorical [1..1]
{
HowOld24 "18 to 24" expression("Age <= 24"),
HowOld34 "25 to 34" expression("Age >= 24 And Age <= 34"),
HowOld44 "35 to 44" expression("Age >= 34 And Age <= 44"),
HowOld54 "45 to 54" expression("Age >= 44 And Age <= 54"),
HowOld64 "55 to 64" expression("Age >= 54 And Age <= 64"),
HowOld65 "65 plus" expression("age >= 65"),
SumXSquared [CalculationType="SumXSquared", Hidden=True, DataElement="",
ExcludedFromSummaries=True]
elementType(AnalysisSummaryData) multiplier(use Age),
SumX [CalculationType="SumX", Hidden=True, DataElement="",
ExcludedFromSummaries=True]
elementType(AnalysisSummaryData) multiplier(use Age),
SumN [CalculationType="SumN", Hidden=True, DataElement="",
ExcludedFromSummaries=True]
elementType(AnalysisSummaryData),
HowOldMean "Mean"
[CalculationType=Mean, HasNoData=True, ExcludedFromSummaries=True]
elementtype(AnalysisMean),
HowOldSD "Std. dev"
[CalculationType=Stddev, HasNoData=True, ExcludedFromSummaries=True]
elementtype(AnalysisStddev)
};
See also