Advanced SQL queries

SELECT
GROUPBY.COL[index] [[ AS ] column_alias ]
[, BASE(column_name) [ [ AS ] column_alias ]
[, expression [ [ AS ] column_alias ]... ] ]
FROM
table
[GROUP BY
expression ON expression]
[ HAVING
expression ]
[WITH
([BASESUMMARYROW|EvaluateEmptyIterations])]
[ ORDER BY
{ expression } [ ASC | DESC ]
[,{ expression } [ ASC | DESC ]... ] ]

Create a base column to show the total number of cases in the row. Generally, the base includes every case for which the value is not NULL. However, if the IsInBase attribute has been set to True, the case is included in the base regardless of the value. Conversely, if the IsInBase attribute has been set to False, the case is excluded from the base regardless of the value.

When used with BaseSummaryRow, specifies the BaseSummaryRow option, which creates a base row. In nested crosstabulations, it creates a base row for each nested group. When used with EvaluateEmptyIterations, specifies that the categorical grid, which is being upleveld, should return all iterations including empty iterations. If the table selected is not an upleveled categorical gird, this option has no effect.

These examples use the Museum example Quanvert database, which is based on a survey of visitors to a museum. You can run these queries using the DM Query sample tool, which is supplied with the UNICOM Intelligence Developer Documentation Library. The examples were run using the connection option to return category names rather than category values. You select this option on the Advanced tab in the Data Link Properties dialog box. For more information, see Running the example queries in DM Query.

SELECT desc_leave AS Description,
SUM(gender = {male}) AS Male,
SUM(gender = {female}) AS Female
FROM vdata
WHERE desc_leave IS NOT NULL
GROUP BY desc_leave

Description Male Female
------------------------------------ ------ --------
{not answered} 154 120
{place_to_learn} 12 8
{mention_of_things_on_view} 3 0
{something_for_everyone} 0 1

SELECT groupby.col[0] AS Description,
SUM(gender = {male}) AS Male,
SUM(gender = {female}) AS Female
FROM vdata
WHERE desc_leave IS NOT NULL
GROUP BY desc_leave ON desc_leave.DefinedCategories()

Description Male Female
--------------------------- ------ --------
{variation_of_museum_name} 0 0
{something_for_everyone} 0 1
{mention_of_things_on_view} 3 0
{comment_on_layoutdesign} 0 0
{relate_specimens_in_theoretical_way} 0 0
{comments_about_building} 0 0
{requires_lot_of_time} 0 0
{place_to_bring_childrenfamily} 0 0
{general_appreciative_comment} 0 0
{place_to_learn} 12 8
{other} 0 0
{not answered} 154 120

SELECT groupby.col[0] AS Description,
SUM(gender = {male}) AS Male,
SUM(gender = {female}) AS Female
FROM vdata
WHERE desc_leave IS NOT NULL
GROUP BY desc_leave ON {place_to_learn, mention_of_things_on_view, something_for_everyone

Description Male Female
--------------------------- ------ --------
{place_to_learn} 12 8
{mention_of_things_on_view} 3 0
{something_for_everyone} 0 1

SELECT groupby.col[0] AS Age,
BASE(gender) AS 'Base',
SUM(gender = {male}) AS Male,
SUM(gender = {female}) AS Female
FROM vdata
GROUP BY age ON age.DefinedCategories()

Age Base Male Female
--------------- ------ ------ -------
{e1116_years} 38 23 15
{e1720_years} 82 50 32
{e2124_years} 95 51 44
{e2534_years} 192 108 84
{e3544_years} 91 49 42
{e4554_years} 55 32 23
{e5564_years} 33 16 17
{e65_years} 16 10 6

SELECT museums AS Museums,
SUM(gender = {male}) AS Male,
SUM(gender = {female}) AS Female
FROM vdata
WHERE museums IS NOT NULL
GROUP BY MUSEUMS

Standard SQL does not generally produce sensible crosstabulation results when the rows are formed from the categories of a multiple response variable, because a separate row is produced for each combination of responses. Here are the first five rows in the result set:

Museums Male Female
-------------------------------------------------------------------- ------- ----------
{national_museum_of_science} 146 82
{museum_of_design,national_art_gallery} 0 1
{national_museum_of_science,archeological_museum,national_art_ga...} 0 1
{archeological_museum} 7 1
{museum_of_design,archeological_museum,national_art_gallery,nort...} 1 1

SELECT groupby.col[0] AS Museums,
SUM(gender = {male}) as Male,
SUM(gender = {female}) AS Female
FROM vdata
WHERE museums IS NOT NULL
GROUP BY museums ON museums.DefinedCategories()

Museums Male Female
------------------------------------- ------- ----------
{national_museum_of_science} 209 124
{museum_of_design} 49 43
{institute_of_textiles_and_fashion} 31 16
{archeological_museum} 18 8
{national_art_gallery} 7 12
{northern_gallery} 11 10
{other} 16 21
{not_answered} 0 0

SELECT groupby.col[0] AS Museums,
SUM(gender = {male}) AS Male,
SUM(gender = {female}) AS Female
FROM vdata
WHERE museums IS NOT NULL
GROUP BY museums ON museums.DefinedCategories()
WITH (BaseSummaryRow)

Museums Male Female
----------------------------------- ------ ----------
255 171
{national_museum_of_science} 209 124
{museum_of_design} 49 43
{institute_of_textiles_and_fashion} 31 16
{archeological_museum} 18 8
{national_art_gallery} 7 12
{northern_gallery} 11 10
{other} 16 21
{not_answered} 0 0

This example shows how to create a crosstabulation using the categories of a multiple response variable as the columns.

SELECT groupby.col[0] AS Gender,
BASE(museums) AS 'Base',
SUM(museums >= {national_museum_of_science}) AS Science,
SUM(museums >= {museum_of_design}) AS Design,
SUM(museums >= {institute_of_textiles_and_fashion}) AS Textiles,
SUM(museums >= {archeological_museum}) AS Archeology,
SUM(museums >= {national_art_gallery}) AS National,
SUM(museums >= {northern_gallery}) AS Northern,
SUM(museums >= {other}) AS Other,
SUM(museums >= {not_answered}) AS NA
FROM vdata
GROUP BY gender ON gender.definedcategories()

Gender Base Science Design Textiles Archeology National Northern Other NA
------ ---- ------- ------ -------- ---------- -------- -------- ----- --
{male} 255 209 49 31 18 7 11 16 0

{female} 171 124 43 16 8 12 10 21 0

This is an example of using the UNICOM Intelligence Data Model SQL extensions to create a nested crosstabulation.

SELECT groupby.col[0] AS Museums,
groupby.col[1] AS Gender,
SUM(age = {e1116_years}) AS '11-16',
SUM(age = {e1720_years}) AS '17-20',
SUM(age = {e2124_years}) AS '21-24',
SUM(age = {e2534_years}) AS '25-34',
SUM(age = {e3544_years}) AS '35-44',
SUM(age = {e4554_years}) AS '45-55',
SUM(age = {e5564_years}) AS '55-64',
SUM(age = {e65_years}) AS '65+'
FROM vdata
WHERE museums IS NOT NULL
GROUP BY museums ON museums.DefinedCategories(),
gender ON gender.DefinedCategories()

Museums Gender 11-16 16-20 21-24 25-34 35-44 45-54 55-64 65+
---------------------------------- -------- ----- ----- ----- ----- ----- ----- ----- ---
{national_museum_of_science} {male} 17 35 31 67 27 20 9 3
{national_museum_of_science} {female} 10 12 19 40 24 10 7 2
{museum_of_design} {male} 1 10 6 11 8 8 4 1
{museum_of_design} {female} 5 4 8 10 7 4 4 1
{institute_of_textiles_and_fashion} {male} 4 6 4 8 4 2 2 1
{institute_of_textiles_and_fashion} {female} 0 1 6 4 1 1 3 0
{archeological_museum} {male} 3 5 3 2 2 2 1 0
{archeological_museum} {female} 0 0 1 5 2 0 0 0
{national_art_gallery} {male} 1 2 2 1 1 0 0 0
{national_art_gallery} {female} 0 3 3 3 1 2 0 0
{northern_gallery} {male} 1 1 6 1 1 0 1 0
{northern_gallery} {female} 1 1 1 2 1 3 1 0
{other} {male} 1 4 4 3 4 0 0 0
{other} {female} 0 5 4 6 2 1 2 1
{not_answered} {male} 0 0 0 0 0 0 0 0
{not_answered} {female} 0 0 0 0 0 0 0

Museums Gender 11-16 16-20 21-24 25-34 35-44 45-54 55-64 65+
---------------------------------- -------- ----- ----- ----- ----- ----- ----- ----- ---
32 60 72 134 59 38 23 8
{male} 21 43 40 79 33 24 11 4
{female} 11 17 32 55 26 14 12 4
{national_museum_of_science} 27 47 50 107 51 30 16 5
{national_museum_of_science} {female} 10 12 19 40 24 10 7 2
{national_museum_of_science} {male} 17 35 31 67 27 20 9 3
{museum_of_design} 6 14 14 21 15 12 8 2
{museum_of_design} {male} 1 10 6 11 8 8 4 1
{museum_of_design} {female} 5 4 8 10 7 4 4 1
{institute_of_textiles_and_fashion} 4 7 10 12 5 3 5 1
{institute_of_textiles_and_fashion} {male} 4 6 4 8 4 2 2 1
{institute_of_textiles_and_fashion} {female} 0 1 6 4 1 1 3 0
{archeological_museum} 3 5 4 7 4 2 1 0
{archeological_museum} {male} 3 5 3 2 2 2 1 0
{archeological_museum} {female} 0 0 1 5 2 0 0 0
{national_art_gallery} 1 5 5 4 2 2 0 0
{national_art_gallery} {male} 1 2 2 1 1 0 0 0
{national_art_gallery} {female} 0 3 3 3 1 2 0 0
{northern_gallery} 2 2 7 3 2 3 2 0
{northern_gallery} {male} 1 1 6 1 1 0 1 0
{northern_gallery} {female} 1 1 1 2 1 3 1 0
{other} 1 9 8 9 6 1 2 1
{other} {male} 1 4 4 3 4 0 0 0
{other} {female} 0 5 4 6 2 1 2 1
{not_answered} 0 0 0 0 0 0 0 0
{not_answered} {male} 0 0 0 0 0 0 0 0
{not_answered} {female} 0 0 0 0 0 0 0 0

SELECT groupby.col[0] AS Age,
SUM(gender = {male}) / COUNT(gender) * 100 As Male,
SUM(gender = {female}) / COUNT(gender) * 100 As Female
FROM vdata
GROUP BY age ON age.definedcategories()
WITH (BaseSummaryRow)

Age Male Female
--------------- ------------------ ------------------
56.312292358803987 43.687707641196013
{e1116_years} 60.526315789473685 39.473684210526315
{e1720_years} 60.975609756097562 39.024390243902438
{e2124_years} 53.684210526315788 46.315789473684212
{e2534_years} 56.25 43.75
{e3544_years} 53.846153846153847 46.153846153846153
{e4554_years} 58.18181818181818 41.818181818181813
{e5564_years} 48.484848484848484 51.515151515151516
{e65_years} 62.5 37.5

Note The UNICOM Intelligence Data Model does not support column percentages directly. However, you can create column percentages by post-processing the recordset. Similarly, if you want to define the number of decimal places, you need to post-process the results. If you have the InterviewReporter Accelerator Kit, you can see an example of both of these types of post-processing in the InterviewReporter source code.

This example uses the standard SQL aggregate functions of AVG, MIN, MAX, and SUM to calculate summary statistics of the visits numeric variable. The WHERE clause is used to filter out respondents for whom the variable holds a negative value, which in this Quanvert database indicates that they were not asked the question on which the visits variable is based.

SELECT groupby.col[0] AS Gender,
AVG(visits) AS Average,
MIN(visits) AS Minimum,
MAX(visits) AS Maximum,
SUM(visits) AS 'Sum'
FROM vdata
WHERE visits > 0
GROUP BY gender ON gender.DefinedCategories()
WITH (BaseSummaryRow)

Gender Average Minimum Maximum Sum
------ ------------------ --------- --------- ------
6.9868852459016395 1 99 2131
{male} 7.3954802259887007 1 50 1309
{female} 6.421875 1 99 822

SELECT age AS Age,
AVG(visits) AS 'Average number of visits'
FROM vdata
WHERE visits > 0
GROUP BY age
HAVING AVG(visits) > 5
ORDER BY age

Age Average number of visits
--------- ------------------------
{e1116_years} 5.9666666666666668
{e2124_years} 5.6511627906976747
{e2534_years} 5.3043478260869561
{e3544-years} 8.3409090909090917
{e4554_years} 6.9375
{e5564_years} 20.9375
{e65_years} 18.125

In market research surveys, respondents are frequently asked to choose a rating on a predefined scale for a number of products in a list. This type of data is usually best presented in a grid table that shows the rating categories on one axis and the product list on the other. This example shows how you can create a grid table using a single query.

In the Museum survey, the rating question asked respondents to give a rating to each of the galleries. The responses for each gallery are stored in a separate variable. These variables have names in the form rating[{gallery}].Column, where gallery is the name of the gallery.

SELECT groupby.col[0] As Rating,
SUM(rating[{Dinosaurs}].Column >= groupby.col[0] AND rating[{Dinosaurs}].Column IS NOT NULL) As Dinosaurs,
SUM(rating[{Conservation}].Column >= groupby.col[0] AND rating[{Conservation}].Column IS NOT NULL) As Conservation,
SUM(rating[{Insects}].Column >= groupby.col[0] AND rating[{Insects}].Column IS NOT NULL) As Insects,
SUM(rating[{Whales}].Column >= groupby.col[0] AND rating[{Conservation}].Column IS NOT NULL) As Whales,
SUM(rating[{Botany}].Column >= groupby.col[0] AND rating[{Botany}].Column IS NOT NULL) As Botany
FROM vdata
GROUP BY rating[{Dinosaurs}].Column.DefinedCategories()
ON rating[{Dinosaurs}].Column.DefinedCategories() with (BaseSummaryRow)

Rating Dinosaurs Conservation Insects Whales Botany
------ --------- ------------ ------- ------ ------
298 298 298 298 298
{not_at_all_interested_1} 10 3 13 2 11
{not_particularly_interested_2} 1 2 2 0 3
{no_opinion_3} 27 7 18 14 13
{slightly_interested_4} 113 30 27 28 14
{very_interested_5} 107 14 19 38 14

This example is similar to the crosstabulation shown in Example 5 above, except that it is weighted using the genbalance weighting variable. The genbalance variable is a numeric (Double) variable that has already been set up to weight the data to reflect an equal balance between the genders.

SELECT groupby.col[0] AS Museums,
SUM((gender = {male}) * genbalance) AS Male,
SUM((gender = {female}) * genbalance) AS Female
FROM vdata
WHERE museums IS NOT NULL
GROUP BY museums ON museums.DefinedCategories()
WITH (BaseSummaryRow)

Museums Male Female
----------------------------------- ------------------ ------------------
226.41594260931012 195.70721983909615
{national_museum_of_science} 185.57228237390515 141.91634654998788
{museum_of_design} 43.507377207279198 49.212926626205522
{institute_of_textiles_and_fashion} 27.525075376033779 18.311786651611364
{archeological_museum} 15.982301831245421 9.15589332580568
{national_art_gallery} 6.2153396010398847 13.733839988708519
{northern_gallery} 9.7669622302055341 11.4448666572571
{other} 14.206490516662596 24.034219980239921
{not_answered} 0 0

This example shows the grid table from Example 12 weighted using the genbalance weighting variable.

SELECT groupby.col[0] As Rating,
SUM((rating[{Dinosaurs}].Column >= groupby.col[0] AND rating[{Dinosaurs}].Column IS NOT NULL) * genbalance) As Dinosaurs,
SUM((rating[{Conservation}].Column >= groupby.col[0] AND rating[{Conservation}].Column IS NOT NULL) * genbalance) As Conservation,
SUM((rating[{Insects}].Column >= groupby.col[0] AND rating[{Insects}].Column IS NOT NULL) * genbalance) As Insects,
SUM((rating[{Whales}].Column >= groupby.col[0] AND rating[{Conservation}].Column IS NOT NULL) * genbalance) As Whales,
SUM((rating[{Botany}].Column >= groupby.col[0] AND rating[{Botany}].Column IS NOT NULL) * genbalance) As Botany
FROM vdata
GROUP BY rating[{Dinosaurs}].Column.DefinedCategories()
ON rating[{Dinosaurs}].Column.DefinedCategories() with (BaseSummaryRow)

Rating Dinosaurs Conservation Insects Whales Botany
------------------------------- ------------------ ------------------ ------------------ ------------------ ------------------
297.69483596086508 297.69483596086508 297.69483596086508 297.69483596086508 297.69483596086508
{not_at_all_interested_1} 10.161961615085611 3.4334599971771302 13.852002620697039 2.0323923230171221 11.30644828081132
{not_particularly_interested_2} 1.14448666572571 1.775811314582824 1.775811314582824 0 2.920297980308534
{no_opinion_3} 26.795843839645411 6.471920609474183 18.5481119155884 14.483327269554154 13.082259595394145
{slightly_interested_4} 111.36632263660434 30.229303836822545 27.309005856514009 27.427168488502527 14.226746261119855
{very_interested_5} 107.06521272659306 13.200422227382665 18.15311253070832 39.128616154193921 13.970165252685558

This example is designed for data collected using UNICOM Intelligence Interviewer, which sets the DataCollection.Status DataCollection.FinishTime system variables. These system variables are not present in a Quanvert database, and so the following query does not run on the Museum sample Quanvert database. However, you could run it on the The Short Drinks sample.

SELECT groupby.col[0] AS FinishTime,
SUM(DataCollection.Status >= {completed}) AS Completed,
SUM(DataCollection.Status >= {timedout}) AS Timedout
FROM VDATA
WHERE DataCollection.FinishTime > 0
GROUP BY Left(CText(DataCollection.FinishTime),Find(CText(DataCollection.FinishTime),' ') )
ORDER BY DataCollection.FinishTime