Data Model > Extending the UNICOM Intelligence Data Model > Creating a CDSC > Handling hierarchical data
 
Handling hierarchical data
This section describes how to handle hierarchical data using VDATA and HDATA. Before reading this section, read the following sections, which provide an introduction to hierarchical data and how it is handled by the UNICOM Intelligence Data Model:
Loops, grids, and levels. An introduction to the types of hierarchical constructions that are typically found in market research questionnaires and how the UNICOM Intelligence Data Model represents the response data.
Hierarchical data in the MDM. More detailed information of how these survey constructions map to objects in the Metadata Model (MDM) and the columns and tables that store the case data.
Understanding hierarchical data. Tutorial-style exercises that are designed to help you understand how the UNICOM Intelligence Data Model represents hierarchical response data in the flat VDATA virtual table and hierarchical HDATA virtual tables.
A CDSC can represent data in a single flat VDATA table, multiple hierarchical HDATA tables, or both the single flat VDATA table and multiple hierarchical HDATA tables. When a CDSC represents the data in both forms, the flat VDATA table normally represents the data at the top (or root) level only. Representing data using the HDATA view has many advantages during analysis and is recommended when the underlying data is stored in a hierarchical format.
The following example is used in this section for illustration purposes and is deliberately simplistic. In real life, hierarchical data cannot always be flattened, for example, because the maximum number of iterations is unknown or because some of the child tables are not directly related to each other. Hierarchical data that cannot be flattened should always be represented in the hierarchical HDATA format.
The following table shows information collected in a household survey, which asks for information about the household, about each person who is part of the household, and about each vehicle owned by each of these people. Household, person, and vehicle are called levels.
ID
Area
HouseType
Gender
Age
Vote
Transport
UseItOften
1
North
Flat
Male
30
Lib
Car
True
 
 
 
 
 
 
Bike
False
 
 
 
Fem
28
Dem
Bike
True
2
East
Farm
Male
65
Con
Car
False
 
 
 
 
 
 
Bike
True
 
 
 
 
 
 
Truck
False
3
South
Villa
Male
25
Lib
Car
True
 
 
 
 
 
 
Car
False
This information can be stored simply but inefficiently, in a VDATA table, in which there is duplication of the data at the household and person levels:
ID
Area
HouseType
Gender
Age
Vote
Transport
UseItOften
1
North
Flat
Male
30
Lib
Car
True
1
North
Flat
Male
30
Lib
Bike
False
1
North
Flat
Fem
28
Dem
Bike
True
2
East
Farm
Male
65
Con
Car
False
2
East
Farm
Male
65
Con
Bike
True
2
East
Farm
Male
65
Con
Truck
False
3
South
Villa
Male
25
Lib
Car
True
3
South
Villa
Male
25
Lib
Car
False
When the same data is represented using HDATA hierarchical tables, a separate table is used for the data at each level:
The data is then represented as follows:
ID
Area
HouseType
Gender
Age
Vote
Transport
UseItOften
1
North
Flat
Person
 
 
 
 
 
 
 
Male
30
Lib
Vehicle
 
 
 
 
 
 
 
Car
True
 
 
 
 
 
 
Bike
False
 
 
 
Fem
28
Dem
Vehicle
 
 
 
 
 
 
 
Bike
True
2
East
Farm
Person
 
 
 
 
 
 
 
Male
65
Con
Vehicle
 
 
 
 
 
 
 
Car
False
 
 
 
 
 
 
Bike
True
 
 
 
 
 
 
Truck
False
3
South
Villa
Person
 
 
 
 
 
 
 
Male
25
Lib
Vehicle
 
 
 
 
 
 
 
Car
True
 
 
 
 
 
 
Car
False
The top hierarchical table, which stores the household information in this example, is called HDATA and is sometimes termed the root level, defined as level 0. This can be seen in the XML code topic, which gives examples of the data in this example presented in two ways; flattened in a VDATA table, and hierarchically using HDATA tables.
See also
VDATA and HDATA examples
XML code
Reusing commands
Creating a CDSC