Hierarchical SQL queries
When you run queries against the hierarchical HDATA view of the data, the SELECT statement supports sub-select statements. These enable you to retrieve items from lower levels (that is levels below the root (top) HDATA level).
Syntax
SELECT
[ ALL | DISTINCT ]
{ * | LevelName
.(<sub_select>) | expression [ [ AS ] column_alias ]
[, LevelName
.(<sub_select>) | expression [ [ AS ] column_alias ]... ] }
FROM
table[.
LevelName]
[ WHERE
expression ]
[ GROUP BY
expression [, expression... ] ]
[ WITH
(EvaluateEmptyIterations) ]
[ ORDER BY
{ expression } [ ASC | DESC ]
[,{ expression } [ ASC | DESC ]... ] ]
<sub_select> ::= { * | LevelName
.(<sub_select>) | expression [ [ AS ] column_alias ]
[, expression [ [ AS ] column_alias ]... ] }
[ WHERE
expression ]
[ GROUP BY
expression [, expression... ] ]
[ WITH
(EvaluateEmptyIterations) ]
[ ORDER BY
{ expression } [ ASC | DESC ]
[,{ expression } [ ASC | DESC ]... ] ]
Examples
The examples in this topic use the UNICOM Intelligence Data File version of the Household sample data set, which provides an example of complex hierarchical data. The following diagram shows the levels structure in the Household sample.
Each case at the top level corresponds to a household. For more information about the household sample, including details of the questionnaire logic and the variables at each level, see
The Household sample.
You can run the example queries using the DM Query sample tool, which is supplied with the UNICOM Intelligence Developer Documentation Library. The examples were run using the connection option to return category names rather than category values. You select this option on the Advanced tab in the Data Link Properties dialog. For step-by-step instructions on setting up DM Query to use the Household sample, see the second example in
Understanding hierarchical data.
1. Using a sub-select to retrieve columns from the Person level
This example selects the LevelId, Age, and Gender variables from the Person level.
SELECT Person.(LevelId, Gender, Age)
FROM HDATA
Here are the first 10 rows in the result set:
+ LevelId Gender Age
------- -------- ---
- 1 {male} 25
- 1 {female} 45
2 {male} 43
3 {female} 15
4 {male} 12
- 1 {female} 72
2 {male} 81
- 1 {female} 32
2 {female} 9
3 {male} 2
The LevelId is a special column that is present in all lower levels and it stores an identifier for the iterations. The Person level has a numeric iterator identifier and so this column stores 1 for the first person in each household, 2 for the second person, and so on. The first 10 rows belong to four households.
You must use the same syntax (and not omit the parentheses) when you want to retrieve one column. For example:
SELECT Person.(Age)
FROM HDATA
You can use the select all (*) syntax in a sub-select statement. This will retrieve all of the columns in the specified level, plus any nested levels as child recordsets. For example:
SELECT Person.(*)
FROM HDATA
2. Up-leving data in the sub-select
All of the variables you specify in a sub-select statement must be at the level of the sub-select. You can include variables from a lower child level by up-leving the data. Up-leving collapses data from a lower level so that it can be represented at a higher parent level.
To illustrate how this works, first use a nested sub-select statement to retrieve the DaysAway column from the Trip level, which is a child of the Person level.
SELECT Person.(LevelID, Gender, Age, Trip.(DaysAway))
FROM HDATA
All the identifiers must be specified relative to the level of the expression, which is the level of the sub-select.
Here are the results for the first 12 people:
+ LevelId Gender Age + DaysAway
------- -------- --- --------
- 1 {male} 25 - 14
7
- 1 {female} 45
2 {male} 43
3 {female} 15
4 {male} 12
- 1 {female} 72
2 {male} 81
- 1 {female} 32 - 27
2 {female} 9 - 27
3 {male} 2 - 27
- 1 {male} 29 - 7
3
5
14
14
2 {male} 31
The nested sub-select has returned the data for the Trip level as a child recordset. Look at the results for the first 12 people: the number of rows in the child recordset varies from zero to five. For example, the first person has two rows, the next person has no rows, and so on. Now up-lev the data in the DaysAway column to the Person level.
SELECT Person.(LevelId, Gender, Age, SUM(Trip.(DaysAway)) As DaysAway)
FROM HDATA
Here are the first 12 rows in the result set:
+ LevelId Gender Age DaysAway
------- -------- --- --------
- 1 {male} 25 21
- 1 {female} 45 0
2 {male} 43 0
3 {female} 15 0
4 {male} 12 0
- 1 {female} 72 0
2 {male} 81 0
- 1 {female} 32 27
2 {female} 9 27
3 {male} 2 27
- 1 {male} 29 43
2 {male} 31 0
The DaysAway data has now been collapsed into a single row for each person.
3. Down-leving data in the sub-select
You can use the ^ operator to down-lev data. Down-leving expands higher-level data so that it can be represented at a lower child level.
To illustrate how this works, first run this query to retrieve the Region column (which is at the top level) from the top level as well as the same three columns from the Person level:
SELECT Region,
Person.(LevelId, Gender, Age)
FROM HDATA
Here are the results for the first 10 people:
+ Region LevelId Gender Age
------ ------- -------- ---
- {north} 1 {male} 25
- {south} 1 {female} 45
2 {male} 43
3 {female} 15
4 {male} 12
- {west} 1 {female} 72
2 {male} 81
- {east} 1 {female} 32
2 {female} 9
3 {male} 2
This time the data for the Person level has been returned as a child recordset. Again, you can see that the number of rows in the child recordset varies.
Now use the down-lev operator to expand the top-level Region column to the Person level.
SELECT Person.(LevelId, Gender, Age, ^.Region As Region)
FROM HDATA
Here are the first 10 rows in the result set:
+ LevelId Gender Age Region
------- -------- --- ------
- 1 {male} 25 {north}
- 1 {female} 45 {south}
2 {male} 43 {south}
3 {female} 15 {south}
4 {male} 12 {south}
- 1 {female} 72 {west}
2 {male} 81 {west}
- 1 {female} 32 {east}
2 {female} 9 {east}
3 {male} 2 {east}
The region has now been repeated for each person in each household.
4. Combining data from different levels
This example concatenates the data in the person-level Name column with the data the Address column after it has been down-leved from the top level:
SELECT Person.(LevelId, Name + ', ' + ^.Address As NameAndAddress)
FROM HDATA
Here are the first 5 rows in the result set:
+ LevelId NameAndAddress
------- --------------------------------------------------
- 1 Nigel Sorrell, 15B Park Avenue, Harrogate, HG1 4TY
- 1 Maria Smith, 46 Freedom Lane, Brighton, BN2 3YT
2 Fred Smith, 46 Freedom Lane, Brighton, BN2 3YT
3 Jenna Smith, 46 Freedom Lane, Brighton, BN2 3YT
4 Harry Smith, 46 Freedom Lane, Brighton, BN2 3YT
5. Filtering the data retrieved by a sub-select
You can include a WHERE clause to filter the data retrieved by a sub-select statement. For example, the following sub-select statement uses a WHERE clause to select females only:
SELECT Person.(LevelId, Gender, Age WHERE Gender = {female})
FROM HDATA
Here are the first 10 rows in the result set:
+ LevelId Gender Age
------- -------- ---
- 1 {female} 45
3 {female} 15
- 1 {female} 72
- 1 {female} 32
2 {female} 9
- 1 {female} 63
- 2 {female} 39
4 {female} 7
A Null row is returned for top-level records for which there are no lower-level records that pass the WHERE clause (in this example, that is all-male households).
6. Working with parallel levels
All of the preceding examples use levels that have a direct parent-child relationship. However, some data sets contain levels that are not related to each other. For example, the diagram of the levels structure of the Household sample shows that the Person and Vehicle levels are parallel to each other (on different branches of the tree). This means that the data in the two levels is not directly related to each other. However, you might want to include data from a parallel level in an expression; for example, you might want to select men who live in households that have a motorbike. To do this, you must use the data in the parallel Vehicle level in your Person-level expression.
When you include data from a parallel level in an expression, you must up-lev the data to the first common ancestor level and then down-lev the data to the level of the expression. In the Household sample, the only level that is an ancestor of both the Person and Vehicle levels is the top (HDATA) level. In the following example, the Vehicle-level data is up-leved to the top level and down-leved to the Person level:
SELECT Person.(Age, Occupation,
^.Sum(Vehicle.(Mileage)) As 'Total Mileage'
WHERE ^.Sum(Vehicle.(VehicleType = {Motorbike})) > 0 And Gender = {Male})
FROM HDATA
Here is the result set:
+ Age Occupation Total Mileage
--- ---------- -------------
- 29 {in_paid_employment} 43300
31 {selfemployed} 43300
In the Household sample there are only two men who live in a household that has a motorbike.
7. Sorting the data retrieved by a sub-select
You can include an ORDER BY clause to sort the data retrieved by a sub-select statement. For example, the following sub-select statement uses an ORDER BY clause to sort the results by age:
SELECT Household, Person.(LevelId, Gender, Age ORDER BY Age)
FROM HDATA
Here are the results for the first four households:
+ Household LevelId Gender Age
--------- ------- ------ ---
- 1 1 {male} 25
- 2 4 {male} 12
3 {female} 15
2 {male} 43
1 {female} 45
- 3 1 {female} 72
2 {male} 81
- 4 3 {male} 2
2 {female} 9
1 {female} 32
In this example, the Household column has been included. This stores a sequential serial number for the records at the top level. Notice how the Person-level rows for each household are sorted in ascending age order.
8. Aggregated sub-select
You can include aggregation functions in a sub-select statement. The following example, returns the number of adults in each household:
SELECT Household,
Person.(COUNT(age) As NumberOfAdults WHERE Age > 16)
FROM HDATA
Here are the results for the first five households:
Household NumberOfAdults
--------- --------------
1 1
2 2
3 2
4 1
5 2
9. Creating a frequency table in the sub-select
The following example uses the Count aggregation function and a GROUP BY clause to create a frequency table at the Person level. For example:
SELECT Household,
Person.(Occupation, Count(Occupation) As 'Count'
GROUP BY Occupation)
FROM HDATA
Here are the results for the first five households:
+ Household Occupation Count
--------- ------------------- -----
- 1 {selfemployed} 1
- 2 {in_paid_employment} 2
{school_child} 2
- 3 {retired} 2
- 4 {selfemployed} 1
{school_child} 1
{preschool_child} 1
- 5 {in_paid_employment} 1
{selfemployed} 1
10. Retrieving columns at a lower level
You can retrieve columns from a lower level by appending the level name to HDATA in the FROM clause. For example, the following selects the LevelId, Gender, and Age columns from the Person level.
SELECT LevelId, Gender, Age FROM HDATA.Person
Here are the first 10 rows in the result set:
LevelId Gender Age
------- -------- ---
1 {male} 25
1 {female} 45
2 {male} 43
3 {female} 15
4 {male} 12
1 {female} 72
2 {male} 81
1 {female} 32
2 {female} 9
3 {male} 2
When you use this syntax, all of the variables you specify in the select statement must be at the specified lower level. However, like a sub-select, you can include variables from a child level by up-leving the data. For example:
SELECT LevelId, Gender, Age,
Trip.(DaysAway) As DaysAway
FROM HDATA.Person
Similarly, you can include variables from a parent level by down-leving the data. For example:
SELECT LevelId, Gender, Age,
^.Region As Region
FROM HDATA.Person
You can filter the data retrieved using a WHERE clause. For example:
SELECT LevelId, Gender, Age
FROM HDATA.Person
WHERE Gender = {Female}
Here are the first 10 rows in the result set:
LevelId Gender Age
------- -------- ---
1 {female} 45
3 {female} 15
1 {female} 72
1 {female} 32
2 {female} 9
1 {female} 63
2 {female} 39
4 {female} 7
1 {female} 25
2 {female} 23
You can retrieve columns from the Trip level, which is a child of the Person level. For example:
SELECT LevelId, Country, DaysAway, Purpose
FROM HDATA.Person.Trip
Here are the first five rows in the result set:
LevelId Country DaysAway Purpose
------- -------------- -------- ----------------------------
1 {united_states} 14 {vacation}
2 {switzerland} 7 {sport}
1 {india} 27 {vacation,visiting_relatives}
1 {india} 27 {vacation,visiting_relatives}
1 {india} 27 {vacation,visiting_relatives}
11. Aggregation at a lower level
You can aggregate data at the lower level. The following example, returns a frequency table for all cases at the Person level:
SELECT Gender, COUNT(Gender) As 'Count'
FROM HDATA.Person
GROUP BY Gender
Here is the result set:
Gender Count
-------- -----
{male} 12
{female} 13
This aggregates all of the cases at the Person level, whereas when you perform the aggregation in a sub-select statement, the Person-level records are aggregated for each household. For example, here is a similar query in a sub-select:
SELECT Household,
Person.(Gender, COUNT(Gender) As 'Count' GROUP BY Gender)
FROM HDATA
Here are the results for the first three households:
+ Household Gender Count
--------- -------- -----
- 1 {male} 1
- 2 {female} 2
{male} 2
- 3 {female} 1
{male} 1
See also