Hierarchical data in the MDM
Market research questionnaires often contain individual questions and sets of questions that are asked more than once. For example, questionnaires often contain grids that ask respondents to answer a question (often by choosing a rating on a predefined scale) for a number of products in a list, and sets of questions that respondents are asked to answer for each product in a list of products or for each person in a household.
The number of times the question or set of questions is to be asked can be controlled in three main ways:
▪By the categories in a category list. For example, “For each brand in the following list, please answer the following...”
▪By a numeric expression that has a known upper limit. For example, “For each of the first three journeys you described earlier, please answer the following...”
▪By a numeric expression that has an unknown upper limit. For example, “For each drink you consumed last week, please answer the following...”
Each of these constructions can be considered a loop that defines the question or set of questions and the number of times they are to be asked (or, in more technical terms, the number of times the loop is to be iterated). From the UNICOM Intelligence Data Model's point of view, whether the questions in the loop are presented simultaneously as a grid or sequentially (one after the other) is a matter of presentation and is not relevant to the way the case data is stored.
Internally in the MDM, a loop corresponds to an Array object. To understand how it works, consider the following loop, which is presented here in a grid-like format:
This loop contains two questions, Name and Gender, which are asked up to 6 times. This means that the loop has 6 possible iterations.
When the response data is presented in the non-hierarchical VDATA virtual table, it is represented in a “flattened form” in which a separate column stores the responses to each question in each possible iteration. In this example, there would be 12 columns (2 * 6). For example, if the Array object is called MyLoop, the following columns would store the responses:
MyLoop[1].Name
MyLoop[2].Name
MyLoop[3].Name
MyLoop[4].Name
MyLoop[5].Name
MyLoop[6].Name
MyLoop[1].Gender
MyLoop[2].Gender
MyLoop[3].Gender
MyLoop[4].Gender
MyLoop[5].Gender
MyLoop[6].Gender
In the MDM, the FullName property of the VariableInstance objects correspond to the names of the VDATA columns. The full name is constructed by the MDM from the names of the Array and variable objects. [ ] (brackets) indicate an iteration; . (period) indicates a parent-child relationship.
The following diagram represents the structure in the MDM. Notice that all of the VariableInstance objects are accessible from the Document.Variables collection. This collection is a flat list of all of the VariableInstance objects in the top-level virtual table, which is VDATA when the case data is being represented in a non-hierarchical form.
The following diagram provides a representation of the VDATA virtual table after responses have been collected for three households.
This method of representing hierarchical data is simple and effective. However, it has some disadvantages, the most obvious one being that the number of columns is fixed. In our household example, this means that storage space is reserved for the responses for six individuals in each household even though many households have fewer people. Conversely, responses cannot be stored for any additional people in large households. Another disadvantage is that performing summary calculations on the data can be difficult.
Representing the case data hierarchically can be more flexible and provides advantages during analysis. The loop is then considered a level and the responses to the questions in the loop are stored in a separate hierarchical table that has the name of the Array object.
When a solely hierarchical representation of the case data is required, you set the Array.Type property to mlLevel. For example, suppose we create an array of this type containing the same variables as in the previous example and call it MyLevel. Provided you are using a CDSC that supports a hierarchical view of the data, a top-level table called HDATA will be created. Like VDATA, this will contain one row for each respondent. However, it won't store the responses to the questions in the MyLevel array. Instead it will store a pointer to a separate MyLevel table for each respondent. The MyLevel table will contain three columns, one called LevelId (which stores an identifier for the iterations) and one for each of the questions in the loop, and will store the responses to each iteration in a separate row. Here is a diagrammatic representation.
In the MDM, the Variable objects nested inside the Array directly correspond to the columns in the lower-level virtual tables. The MDM creates only two VariableInstance objects for the questions in the array. MDM constructs the full names of the VariableInstance objects from the names of the Array and variable objects as follows:
MyLevel[..].Name
MyLevel[..].Gender
.. (two periods) are used instead of an iteration number, to indicate all iterations.
The MDM does not add the VariableInstance objects to the Document.Variables collection, because they do not represent columns in the top-level virtual table.
The following diagram represents the structure of the objects in the MDM Document.
Typically, this type of loop is used when the number of iterations is unknown, such as in our household example.
However, provided the CDSC supports HDATA, all grids and loops are presented hierarchically in the HDATA view of the case data, regardless of whether the Array object is defined as type mlLevel. Loops that are defined as expanded are also presented in a flattened form in the VDATA view of the data (provided this view is supported by the CDSC you are using to access the case data). You define a loop as an expanded loop by setting the Array object's Type to mlExpand. Note that Array objects that are defined as type mlLevel are not available in the flattened VDATA view of the case data.
The variable display rules in survey results
Variable display rule | IteratorType = itCategorical | IteratorType = itNumber / itNumericRange |
---|
Array.Type = mlExpand IsGrid=True | Display as a Grid in the variable list when there is just one nested question, or when all nested questions are numeric and the type is the same (Long or Double). Otherwise display as iterations in Build Table and Filter pages. Display each iteration or nested variable as a separate item in the variable list if there is more than one nested question, or there are nested questions of different types. | Ignore the IsGrid property and display each iteration or nested variable as a separate variable in the variable list and Build Table and Filter pages. |
Array.Type = mlExpand IsGrid=False | Display each iteration and nested variable as a separate item in the variable list and Build Table and Filter pages. For example: Feature[{Supplier1}].Region | Display each iteration and nested variable as a separate item in the variable list and Build Table and Filter pages. For example: Feature[1].Region |
Array.Type = mlLevel | Display each nested variable as a separate item. For example: Feature.Region Do not display nested variables or iterations in the Build Table and Filter pages. | Display each nested variable as a separate item. Do not display iterations. For example: Person.Gender Do not display nested variable or iterations in the Build Table and Filter pages. |
When you design the structure of the metadata, you must consider how the response data will be analyzed. This is because you can not export case data collected using an Array of type mlLevel to a format for which the DSC does not support a hierarchical view of the data, such as .
The example in this section is a deliberate oversimplification that is intended to help you understand the MDM structure. In practice, you are not limited to such simple constructions and it is possible to have nested loops and levels.
A good way to understand the structure of the MDM objects, is to examine them in Metadata Model Explorer.
The following mrScriptBasic example creates the two arrays described.
' This sample creates two arrays, the first is
' an expanded array, the second a level.
Dim MDM, MyLoop, Mylevel, MyVariable
' Create the MDM object
Set MDM = CreateObject("MDM.Document")
' Set context to 'Question'
MDM.Contexts.Base = "QUESTION"
MDM.Contexts.Current = "QUESTION"
' Create the first Array object including the question text
Set MyLoop = MDM.CreateArray("MyExpandedLoop", _
"Please answer the following questions for each person in your household:")
' Define the array as expanded with a finite number of iterations
MyLoop.Type = 1 ' mlExpand
MyLoop.IteratorType = 3 ' itNumericRanges
MyLoop.Ranges.RangeExpression = "[1..6]"
' Now create and add the variables to the array
Set MyVariable = MDM.CreateVariable("Name", "What is their name?")
MyLoop.Fields.Add (MyVariable)
' Make the variable text
MyVariable.DataType = mr.Text
' Set the min and max values
MyVariable.MinValue = 5
MyVariable.MaxValue = 30
Set MyVariable = MDM.CreateVariable("Gender", "Male or female?")
MyLoop.Fields.Add (MyVariable)
' Make the variable categorical
MyVariable.DataType = mr.Categorical
' Set the min and max values
MyVariable.MinValue = 1
MyVariable.MaxValue = 1
MyVariable.Elements.Add (MDM.CreateElement("Male", "Male"))
MyVariable.Elements.Add (MDM.CreateElement("Female", "Female"))
' Add the Array to the document fields
MDM.Fields.Add(MyLoop)
' Create the second Array object including the question text
Set MyLevel = MDM.CreateArray("MyLevel", _
"Please answer the following questions for each person in your household:")
' Define the array as a level with an unspecified number of iterations
MyLevel.Type = 0 ' mlLevel
MyLevel.IteratorType = 3 ' itNumericRanges
MyLevel.Ranges.RangeExpression = "[1..]"
' Now clone the variables in MyLoop and add them to the second array
Set MyVariable = MDM.CloneObject(MyLoop.Fields["Name"])
MyLevel.Fields.Add(MyVariable)
Set MyVariable = MDM.CloneObject(MyLoop.Fields["Gender"])
MyLevel.Fields.Add(MyVariable)
' Add the array to the document fields
MDM.Fields.Add(MyLevel)
MDM.Save("C:\Program Files\IBM\SPSS\DataCollection\7\DDL\Output\MyLoops.mdd")
MDM.Close()
A similar example in VB.NET is:
Private Sub CreateArrays()
Dim MyDocument As MDMLib.Document
Dim MyLoop As MDMLib.Array
Dim MyVariable As MDMLib.IVariable2
Dim MyLevel As MDMLib.Array
' Create the MDM object
MyDocument = New MDMLib.Document
' Set context to 'Question'
MyDocument.Contexts.Base = "QUESTION"
MyDocument.Contexts.Current = "QUESTION"
' Create the first Array object including the question text
MyLoop = MyDocument.CreateArray("MyLoop", _
"Please answer the following questions for each person in your household:")
' Define the array as expanded with a finite number of iterations
MyLoop.Type = MDMLib.ArrayTypeConstants.mlExpand
MyLoop.IteratorType = MDMLib.IteratorTypeConstants.itNumericRanges
MyLoop.Ranges.RangeExpression = "[1..6]"
' Now create and add the variables to the array
MyVariable = MyDocument.CreateVariable("Name", "What is their name?")
MyLoop.Fields.Add(MyVariable)
' Make the variable text
MyVariable.DataType = MDMLib.DataTypeConstants.mtText
' Set the min and max values
MyVariable.MinValue = 5
MyVariable.MaxValue = 30
MyVariable = MyDocument.CreateVariable("Gender", "Male or female?")
MyLoop.Fields.Add(MyVariable)
' Make the variable categorical
MyVariable.DataType = MDMLib.DataTypeConstants.mtCategorical
' Set the min and max values
MyVariable.MinValue = 1
MyVariable.MaxValue = 1
MyVariable.Elements.Add(MyDocument.CreateElement("Male", "Male"))
MyVariable.Elements.Add(MyDocument.CreateElement("Female", "Female"))
' Add the array to the document fields
MyDocument.Fields.Add(MyLoop)
' Create the second Array object including the question text
MyLevel = MyDocument.CreateArray("MyLevel", _
"Please answer the following questions for each person in your household:")
' Define the array as a level with an unspecified number of iterations
MyLevel.Type = MDMLib.ArrayTypeConstants.mlLevel
MyLevel.IteratorType = MDMLib.IteratorTypeConstants.itNumericRanges
MyLevel.Ranges.RangeExpression = "[1..]"
' Now clone the variables in MyLoop and add them to the second array
MyVariable = MyDocument.CloneObject(MyLoop.Fields("Name"))
MyLevel.Fields.Add(MyVariable)
MyVariable = MyDocument.CloneObject(MyLoop.Fields("Gender"))
MyLevel.Fields.Add(MyVariable)
' Add the array to the document fields
MyDocument.Fields.Add(MyLevel)
' Save the document
MyDocument.Save(" [INSTALL_FOLDER]\IBM\SPSS\DataCollection\7\DDL\Output\MyLoops.mdd")
MyDocument.Close()
End Sub
See also