Desktop User Guides > Professional > Data management scripting > Data Management Script (DMS) file > Filtering data in a DMS file
 
Filtering data in a DMS file
Filters enable you to perform a data transformation on a subset of the data in a data source. For example, when you transfer data you might often want to transfer some, but not all, of the data. This topic describes some typical scenarios.
Metadata filter
When you export data to a particular format for analysis (such as IBM SPSS Statistics .sav or Quantum .dat), you might want to exclude from the export all of the variables that will not be useful to the analysis that you are planning. You might also want to aggregate existing variables to a new variable (helpful when performing analysis). You can achieve this by defining a metadata filter that defines the variables to include, or aggregate, in the export.
In the DMS file, you define a metadata filter by specifying in the select query defined for the input data source the variables that you want to include or aggregate.
Since version 5.6, the Data Management Object Model (DMOM) checks for duplicate select variables in query strings. As a result, the following example results in the error “Duplicate select variable: Age” when transformation starts:
SelectQuery = "SELECT Person.(Age, Age, Name) FROM HDATA"
1. Filtering metadata via a VDATA query
The following InputDataSource section defines a metadata filter that specifies seven variables that will be included in the data transformation:
InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = "SELECT age, gender, education, _
      interest, rating, remember, signs FROM VDATA"
End InputDataSource
2. Filtering metadata from one level via an HDATA query
The following InputDataSource section defines a metadata filter that specifies three variables, from the Person level, that will be included in the data transformation:
InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = " SELECT Name, Age, Trip FROM HDATA.Person"
End InputDataSource
3. Filtering metadata from different levels via an HDATA query
The following InputDataSource section defines a metadata filter that specifies one variable from the parent level, one variable from current level, and one variable from the children level, that will be included in the data transformation:
InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = "SELECT ^.address, Name, _
      Trip.(Country) FROM HDATA.Person”
End InputDataSource
4. Creating new variables using an expression or aggregation
The following InputDataSource section defines a metadata filter that specifies one aggregation and one expression that will create new variables in the data transformation:
InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = "SELECT sum(visits+visits12) as 'total visits', _
      name+address as person FROM HDATA"
End InputDataSource
5. Selecting a grid slice via a VDATA or HDATA query
The following InputDataSource section defines a metadata filter that specifies two grid slices in the data transformation:
InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = "SELECT Person[3].tvdays[channel_1].column as
      NewGridSlice1, Person[1].Age,
      Person[3].tvdays[channel_1].column FROM HDATA"
End InputDataSource
The first grid slice will be changed to a normal field after transformation (it has been renamed). The second grid slice will remain at its HDATA level structure after transformation (it has not been renamed).
6. Renaming with a metadata filter
Using metadata filtering, you can rename a variable, expression, aggregation, or gird slice.
InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = "SELECT ^.Pets, _
      ^.address as HomeAddress, _
      Name as ‘Person Name’, _
      Age+Weight as ‘PersonWeight’, _
      SUM(Age+Gender) FROM HDATA.Person"
End InputDataSource
The resulting output metadata is:
Household_Pets  (Auto renamed)
HomeAddress;
Person_Name     (Auto renamed)
PersonWeight
SUM_Age_Gender  (Auto renamed)
The system will automatically rename if any of the following conditions are met:
The user specified a new name, but it does not meet variable naming conventions (contains blank spaces, symbols, and so on). For example, Name as ‘Person Name’ would be automatically renamed because of the space between Person and Name.
The Select query contains a down-lev’ed variable, but the user did not explicitly rename the variable. For example, ^.Pets would be automatically renamed to <filename>_Pets (Household_Pets) because it was down-lev’ed one layer from the top level.
As another example, ^.^.Age would be automatically renamed to <filename>_<parent level name>_age (household_person_age) because it was down-leveled two layers from the top level and the person level.
The Select query contains an expression or aggregation, but does not explicitly rename the expression\aggregation. For example, SUM(Age+Gender) would be automatically renamed to SUM_Age_Gender_ based on metadata creation naming conventions.
Tips
Use the UNICOM Intelligence Professional Metadata Viewer to view the names of the variables. For more information, see Using the metadata viewer.
When you have a large number of variables to specify or they have long names, you might find it easier to set up the metadata filter in WinDMSRun. For more information, see WinDMSRun window: Variable Selection.
See also
Cleaning data in an ongoing study
Case data filter
Data Management Script (DMS) file
Cleaning data in an ongoing study
When cleaning data in an ongoing study, you might want to set up a case data filter to include only cases that have not been cleaned before. One way of doing this is to use a global SQL variable: see GlobalSQLVariables section. In the cleaning script, you can also specify that some cases are to be excluded; for example, because they contain questionable responses. In the DMS file, define this type of case data filter by using the dmgrJob.DropCurrentCase method in the OnNextCase Event section. Here is an example:
Event(OnNextCase)
If age < 0 Then dmgrJob.DropCurrentCase()
.
.
.
End Event
You can also use an update query in the OutputDataSource section to remove case data records from the output data source. For example, the following snippet shows how you could use update queries in two OutputDataSource sections to split the case data between two output data sources--one of which stores the clean data and the other the dirty data.
OutputDataSource(Clean, "My clean data")
ConnectionString = "Provider=mrOleDB.Provider.2; _
Data Source=mrDataFileDsc; _
Location=C:\Program
Files\IBM\SPSS\DataCollection\7\DDL\Output\CleanData.ddf"
MetaDataOutputName = "C:\Program
Files\IBM\SPSS\DataCollection\7\DDL\Output\CleanData.mdd"
UpdateQuery="DELETE FROM vdata _
WHERE DataCleaning.Status = {NeedsReview}"
End OutputDataSource

OutputDataSource(Dirty, "My dirty data")
ConnectionString = "Provider=mrOleDB.Provider.2; _
Data Source=mrDataFileDsc; _
Location=C:\Program Files\IBM\SPSS\DataCollection\7\DDL\Output\DirtyData.ddf"
MetaDataOutputName = "C:\Program Files\IBM\SPSS\DataCollection\7\DDL\Output\DirtyData.mdd"
UpdateQuery="DELETE FROM vdata _
WHERE NOT DataCleaning.Status.ContainsAny({NeedsReview})"
End OutputDataSource
This example is in the SplitIntoDirtyAndClean.dms sample. See Sample DMS files.
See also
Case data filter
Filtering data in a DMS file
Data Management Script (DMS) file
Case data filter
In a large study, you might want to export the case data for each region separately, so that it can be analyzed separately. You can achieve this by defining a case data filter that selects the cases for which the region variable has a specified value.
In the DMS file, you define a case data filter of this type using a WHERE clause in the select query defined for the input data source. For example, this InputDataSource section defines a case data filter (shown in red) that selects cases for which the South category is selected for the region variable:
InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = "SELECT * FROM vdata WHERE region = {South}"
End InputDataSource
For more information on defining case data filters using the WHERE clause, see Basic SQL queries.
Tips
You can use the UNICOM Intelligence Professional Metadata Viewer to view the variable and category names. For more information, see Using the metadata viewer.
If you are working with UNICOM Intelligence Interviewer - Server data and want to filter out test, active, and timed out records, or select records based on the date they were collected, you might find it easier to set up the case data filter in WinDMSRun. For more information, see WinDMSRun window: Case Selection.
When cleaning data in an ongoing study, you might want to set up a case data filter to include only cases that have not been cleaned before. One way of doing this is to use a global SQL variable. For more information, see GlobalSQLVariables section. In your cleaning script, you might also want to specify that some cases are to be excluded; for example, because they contain questionable responses. In the DMS file, you can define this type of case data filter using the dmgrJob.DropCurrentCase method in the OnNextCase Event section. Here is an example:
Event(OnNextCase)
  If age < 0 Then dmgrJob.DropCurrentCase() .
  .
  .
End Event
You can also use an update query in the OutputDataSource section to remove case data records from the output data source. For example, the following snippet shows how you could use update queries in two OutputDataSource sections to split the case data between two output data sources: one of which stores the clean data and the other the dirty data.
OutputDataSource(Clean, "My clean data")
ConnectionString = "Provider=mrOleDB.Provider.2; _
Data Source=mrDataFileDsc; _
Location=C:\Program Files\IBM\SPSS\DataCollection\7\DDL\Output\CleanData.ddf"
MetaDataOutputName = "C:\Program Files\IBM\SPSS\DataCollection\7\DDL\Output\CleanData.mdd"
UpdateQuery="DELETE FROM vdata _
WHERE DataCleaning.Status = {NeedsReview}"
End OutputDataSource

OutputDataSource(Dirty, "My dirty data")
ConnectionString = "Provider=mrOleDB.Provider.2; _
Data Source=mrDataFileDsc; _
Location=C:\Program Files\IBM\SPSS\DataCollection\7\DDL\Output\DirtyData.ddf"
MetaDataOutputName = "C:\Program Files\IBM\SPSS\DataCollection\7\DDL\Output\DirtyData.mdd"
UpdateQuery="DELETE FROM vdata _
WHERE NOT DataCleaning.Status.ContainsAny({NeedsReview})"
End OutputDataSource
This example is in the SplitIntoDirtyAndClean.dms sample. For more information, see Sample DMS files.
See
Data Management Script (DMS) file