Data Model > Frequently asked questions > I want to use the Data Model to access data, but I am used to SPSS-style data. Can you give me an overview?
 
I want to use the Data Model to access data, but I am used to SPSS-style data. Can you give me an overview?
For example, when I look at SPSS Statistics .sav file data using the Data Model and a standard tool, the data is not presented as I would expect: numeric categorical variables are shown as strings and when several categorical variables are part of a multiple response set they are "collapsed" into one variable.
Most tools don't understand the concept of multiple response variables, and to a certain extent this is true for SPSS Statistics. (This is why SPSS Statistics has multiple response sets instead of multiple response variables.) One of the main features of the Data Model is that it enables existing tools (like SPSS Statistics and Excel) to read Market research data, which typically includes categorical questions, some of which are multiple response. The Data Model has a "native" variable type of Categorical. This is a fairly unique concept, and during the development of the Data Model, a decision had to be made about how to return data for this variable type when it is queried raw using a standard tool. The decision was made to return raw categorical queries as a string for various reasons, the main being that it's not possible to return a multiple response variable as a single numeric value.
For example, in the query "SELECT Newspapers FROM VDATA" (where Newspapers is a multiple response categorical variable), you might get "{Tribune, NYT, USAToday}" returned. Of course this isn't that useful to SPSS Statistics (and many other tools as well). However, this is where the extended SQL expressions that are part of the Data Model and the extensible UNICOM Intelligence Function Library come into play. UNICOM Intelligence has developed a unique SQL engine and enhanced the SQL operators so that when applied to categorical variables they behave as perhaps those who originally defined SQL might have wanted them to behave if SQL had included the concept of categorical variables. In addition, the functions in the UNICOM Intelligence Function Library, which you can call within SQL queries, enable you to extract important information from categorical variables and other data types.
For detailed information on the above, see Expression evaluation, UNICOM Intelligence Function Library and Adding function libraries.
It is easiest to understand these features by running some example queries and looking at the results. You can use DM Query (which comes with the UNICOM Intelligence Developer Documentation Library) to "follow along". These example queries use the Data Collection Data (.ddf) File version of the Museum sample data, which also comes with the DDL. This file format was created as an easy-to-use data storage format. You could try similar queries (changing variable and category names as appropriate) using a .sav file or a Quanvert database or any other data format for which you have a DSC. To set up DM Query to run the queries using the Museum sample data, see How to run the example queries in DM Query using the museum sample.
Now that you're running DM Query and have opened the Museum sample data, you can enter some SQL queries on the data. But first we need to understand the variable names and categories. Later in this topic there is an example showing how you can easily retrieve the variable names and category information from the Metadata Model (MDM) object model, but for now we'll just use the museums variable. Here are the details of the museums variable:
 
Name
Data type/value
Label (analysis context)
Variable
museums
Categorical (multiple response)
Museums and galleries visited or plans to visit
Category 1
National_Museum_of_Science
25
National Museum of Science
Category 2
Museum_of_Design
26
Museum of Design
Category 3
Institute_of_Textiles_and_Fashion
27
Institute of Textiles and Fashion
Category 4
Archeological_Museum
28
Archeological Museum
Category 5
National_Art_Gallery
29
National Art Gallery
Category 6
Northern_Gallery
30
Northern Gallery
Category 7
Other
18
Other
Category 8
Not_answered
11
Not answered
In addition, we'll use the serial numeric variable so that we can reference individual cases for discussion purposes.
To execute an SQL query, enter it into the text box in DM Query and press Enter.
Let's start with a simple query:
SELECT serial, museums
FROM VDATA
Note Although you enter the queries on one line in DM Query, they are presented here on more than one line to make them easier to read.
The query returns two columns. Here are the results for the first five cases:
serial museums
1 {National_Museum_of_Science}
2 {National_Museum_of_Science}
3 {Museum_of_Design,National_Art_Gallery}
4 {National_Museum_of_Science}
5 {National_Museum_of_Science,Archeological_Museum,National_Art_Gallery}
serial=3 is the first interesting case, because it shows that this really is a multiple response categorical variable: the respondent answered that they had either visited or planned to visit both the Museum of Design and the National Art Gallery.
But at this point you are probably saying to yourself, "having this returned as the text string {Museum_of_Design,National_Art_Gallery} isn't that useful!" OK, OK, just hold on. Let's try another query:
SELECT serial, museums,
museums>={national_museum_of_science} AS MuseumOfScience
FROM VDATA
The query returns three columns. Here are the results for the first five cases:
serial museums MuseumOfScience
1 {National_Museum_of_Science} True
2 {National_Museum_of_Science} True
3 {Museum_of_Design,National_Art_Gallery} False
4 {National_Museum_of_Science} True
5 {National_Museum_of_Science,Archeological_Museum,National_A... True
Now we're getting somewhere. You can see that MuseumOfScience is set to "True" any time the respondent selected the National Museum of Science category, regardless of whether he or she also selected any other category.
And just to show you that you can return the same information for all of the categories in one query, try:
SELECT serial, museums,
museums>={national_museum_of_science} AS MuseumOfScience,
museums>={museum_of_design} AS MuseumOfDesign
FROM VDATA
The query returns four columns. Here are the results for the first five cases:
serial museums MuseumOfScience MuseumOfDesign
1 {National_Museum_of_Science} True False
2 {National_Museum_of_Science} True False
3 {Museum_of_Design,National_Art_Gallery} False True
4 {National_Museum_of_Science} True False
5 {National_Museum_of_Science,Archeological_Museum,National_A... True False
Now you can see that the Data Model can return categorical variables as more than just strings. This query demonstrates how the Data Model uses standard comparison operators (in this case ">=") to act in a special way on variables whose data type is Categorical. For information about the operators (logical, comparison, and so on) and details of how they operate on categorical variables in Expression evaluation. For example, here is an excerpt from Comparison operators on how ">=" works with categorical variables:
"The >= operator returns True if the category list on the left side is a proper superset of, or is equal to, the category list on the right side."
However, you might not want the query to return "True" or "False". You might want it to return "1" or "0". Let's just jump right into another query:
SELECT serial, museums,
clong(museums>={national_museum_of_science}) AS MuseumOfScience
FROM VDATA
The query returns three columns. Here are the results for the first five cases:
serial museums MuseumOfScience
1 {National_Museum_of_Science} 1
2 {National_Museum_of_Science} 1
3 {Museum_of_Design,National_Art_Gallery} 0
4 {National_Museum_of_Science} 1
5 {National_Museum_of_Science,Archeological_Museum,National_A... 1
This query demonstrates the power of the extensible UNICOM Intelligence Function Library that comes with the Data Model. By extensible we mean that it's easy, really easy, for you to add your own functions, which can be called directly from within these SQL queries.
In this example we're using a standard function supplied with the Data Model, the CLong type conversion function. See UNICOM Intelligence Function Library for details of all of the functions that come with the Data Model. The CLong function simply converts each Boolean True into a "1" and each Boolean False into a "0".
This MuseumOfScience variable is what SPSS Statistics would require if you wanted to explode the multiple response categorical variable museums into several SPSS Statistics dichotomous variables. You can explode all of the categorical variables in a single query.
SELECT serial, museums,
CLong(museums>={national_museum_of_science}) AS MuseumOfScience,
CLong(museums>={museum_of_design}) AS MuseumOfDesign
FROM VDATA
The query returns four columns. Here are the results for the first five cases:
serial museums MuseumOfScience MuseumOfDesign
1 {National_Museum_of_Science} 1 0
2 {National_Museum_of_Science} 1 0
3 {Museum_of_Design,National_Art_Gallery} 0 1
4 {National_Museum_of_Science} 1 0
5 {National_Museum_of_Science,Archeological_Museum,National_A... 1 0
OK, but what if you want to represent the data as a multiple category set and not a multiple dichotomy set? Again, the UNICOM Intelligence Function Library comes to the rescue:
SELECT serial, museums,
museums.GetAnswer(0) AS Answered1st,
museums.GetAnswer(1) AS Answered2nd,
museums.GetAnswer(2) AS Answered3rd
FROM VDATA
The query returns five columns. Here are the results for the first five cases:
serial museums Answered1st Answered2nd Answered3rd
1 {National_Museum_of_Science} {National_Museum_of_Science} {} {}
2 {National_Museum_of_Science} {National_Museum_of_Science} {} {}
3 {Museum_of_Design,National_Art_Gallery} {Museum_of_Design} {National_Art_Gallery} {}
4 {National_Museum_of_Science} {National_Museum_of_Science} {} {}
5 {National_Museum_of_Science,Archeological_Museum,National_A... {National_Museum_of_Science} {Archeological_Museum} {National_Art_Gallery}
This query uses the GetAnswer function. You can use the function as a traditional function, such as GetAnswer(museums, 0), or you can use the more object-oriented dotted notation, such as museums.GetAnswer(0) (you may also want to look at the Mid function for other query ideas). GetAnswer(0) returns the first category in a category list. This really is order-dependent: not the order in which the categories were asked, but the order in which the categories were answered. This information can be important when analyzing, for example, unprompted awareness questions.
There are many more interesting queries. You can do everything from finding out how many answers a given respondent gave (using the Len function) all the way to performing crosstabulations, including crosstabulations on multiple response categorical variables, all using SQL queries. For further examples, see Basic SQL queries and Advanced SQL queries.
OK, that was the nurture part. The nature part has to do with the MDM, the metadata definition. However, this is so easy it's trivial. For example, to display all of the variable names, instantiate an MDM object and use code similar to the following (written in Visual Basic):
For Each VariableInstance in MDMDocument.VariableInstances
Debug.Print VariableInstance.FullName
Next
To print each category for a given variable:
For Each Element In Variable.Categories
Debug.Print Element.Name
Next
To find out a variable's data type, look at the VariableInstance.DataType property, and so on.
To find out more about the metadata, try MDM Explorer, which is a tool that comes with the UNICOM Intelligence Developer Documentation Library (the Visual Basic source code is also included). If you use MDM Explorer to open the same metadata file (museum.mdd) that we used for the example queries above, you can examine the museums variable from a metadata perspective. By default, museum.mdd is in the [INSTALL_FOLDER]\IBM\SPSS\DataCollection\7\DDL\Data\Data Collection File\ folder.
MDM Explorer also lets you open a .sav file, Quanvert database, or any data format for which a read-enabled Metadata Data Source Connector (MDSC) exists. What's nice about the Data Model is that your application doesn't have to change because the Data Model represents the metadata information in all of these file storage formats in exactly the same way, using the same object model.
You use the Open mDSC command on the File menu to open an .sav file or Quanvert database in MDM Explorer.
See also
UNICOM Intelligence Data Model
Market research data
Working with the Metadata Model
Working with the Metadata Model: Tutorial
Working with the Case Data Model
Architecture of the UNICOM Intelligence Data Model
Available DSCs
Frequently asked questions