Aggregate functions
The Case Data Model provides support for a number of aggregate functions. You can use these functions in SQL queries to summarize information from multiple rows.
Aggregate functions
AVG(expr)
Returns the average of the values defined in the expression. This function can be used with numeric data only.
BASE(expr)
Returns the total number of cases included in the expression. Generally, the base includes every case for which the value is not NULL. However, if the IsInBase attribute has been set to True, the case is included in the base regardless of the value. Conversely, if the IsInBase attribute has been set to False, the case is excluded from the base regardless of the value.
COUNT(expr)
Returns a count of the cases in a column.
DISTINCT(expr)
Returns unique values.
MIN(expr)
Returns the lowest value defined in the expression.
MAX(expr)
Returns the highest value defined in the expression.
STDEV(expr)
Returns the standard deviation of the values defined in the expression. This function can be used with numeric data only. The standard deviation is a measure of dispersion around the mean. In a normal distribution, 68% of cases fall within one standard deviation of the mean and 95% of cases fall within 2 standard deviations. For example, if the mean age is 45 with a standard deviation of 10, then 95% of the cases would be between 25 and 65 in a normal distribution.
SUM(expr)
When used with numeric data, returns the sum of the values. When used with categorical data, it returns the union of the categorical values.
Examples
The first four examples use the Museum example Quanvert database, which is based on a survey of visitors to a museum and the fourth example uses the
The Short Drinks sample database. You can run these queries using the
DM Query sample tool, which is supplied with the UNICOM Intelligence Developer Documentation Library. The examples were run using the connection option to return category names rather than category values. You select this option on the Advanced tab in the Data Link Properties dialog box. For more information, see
Running the example queries in DM Query.
This example uses a subset of the aggregate functions with the visits numeric variable.
SELECT AVG(visits) AS Average,
BASE(visits) AS 'Base',
COUNT(visits) AS 'Count',
SUM(visits) AS 'Sum',
MIN(visits) AS Minimum,
MAX(visits) AS Maximum,
STDEV(visits) AS 'Standard deviation'
FROM vdata
WHERE visits > 0
Here is the result set shown in DM Query:
When no GROUP BY clause is used, the query returns one row. If you use a GROUP BY clause, the results are returned for each group. For example, you can add the GROUP BY clause to the query to group on the Male and Female categories of the gender variable:
SELECT gender AS Gender,
AVG(visits) AS Average,
BASE(visits) AS 'Base',
COUNT(visits) AS 'Count',
SUM(visits) AS 'Sum',
MIN(visits) AS Minimum,
MAX(visits) AS Maximum,
STDEV(visits) AS 'Standard deviation'
FROM vdata
WHERE visits > 0
GROUP BY gender
Here is the result set:
When the function names are used with the AS keyword as column headings, they must be enclosed in quotation marks because they are reserved words.
The next example demonstrates the differences between using SUM with categorical and numeric data.
SELECT SUM(gender) AS Gender,
SUM(gender = {male}) AS Male,
SUM(gender = {female}) AS Female
FROM vdata
GROUP BY gender
Here is the result set:
The first column returns a categorical value because it is used with a categorical variable. One category is returned for each row because gender is a single response variable. The other two columns return the sum of the values returned by the expressions gender = {male} and gender = {female} respectively. These expressions return a numeric value of 1 when True and a numeric value of 0 when False and so the function returns the number of cases for which the expression is True.
The next example demonstrates the DISTINCT function.
SELECT count(distinct visits)
FROM vdata
The next example demonstrates using the AVG function and the
DataCollection.StartTime and
DataCollection.FinishTime System variables to return the average length of time in seconds it took respondents to complete an UNICOM Intelligence Interviewer questionnaire.
SELECT AVG(CDouble(DataCollection.FinishTime - DataCollection.StartTime) * 86400)
FROM vdata
WHERE DataCollection.Status = {Completed}
Here is the result set:
The UNICOM Intelligence Data Model floating point representation of a date is based on days and so a difference of 1.0 would be one day, 0.25 would be six hours, and so on. Multiplying by 24 * 60 * 60 = 86400 converts the time difference from days to seconds.
See also