Professional > Data management scripting > Data Management Script (DMS) file > Filtering data in a DMS file
 
Filtering data in a DMS file
Filters enable you to perform a data transformation on a subset of the data in a data source. For example, when you transfer data you may often want to transfer some, but not all, of the data. This topic describes some typical scenarios.
Metadata filter
When you export data to a particular format for analysis (such as IBM SPSS Statistics .sav or Quantum .dat), you may want to exclude from the export all of the variables that will not be useful to the analysis that you are planning. You may also want to aggregate existing variables to a new variable (helpful when performing analysis). You can achieve this by defining a metadata filter that defines the variables to include, or aggregate, in the export.
In the DMS file, you define a metadata filter by specifying in the select query defined for the input data source the variables that you want to include or aggregate.
Note Since version 5.6, the Data Management Object Model (DMOM) checks for duplicate select variables in query strings. As a result, the following example results in the error “Duplicate select variable: Age” when transformation starts:
SelectQuery = "SELECT Person.(Age, Age, Name) FROM HDATA"
1. Filtering metadata via a VDATA query
The following InputDataSource section defines a metadata filter that specifies seven variables that will be included in the data transformation:
InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = "SELECT age, gender, education, _
      interest, rating, remember, signs FROM VDATA"
End InputDataSource
2. Filtering metadata from one level via an HDATA query
The following InputDataSource section defines a metadata filter that specifies three variables, from the Person level, that will be included in the data transformation:
InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = " SELECT Name, Age, Trip FROM HDATA.Person"
End InputDataSource
3. Filtering metadata from different levels via an HDATA query
The following InputDataSource section defines a metadata filter that specifies one variable from the parent level, one variable from current level, and one variable from the children level, that will be included in the data transformation:
InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = "SELECT ^.address, Name, _
      Trip.(Country) FROM HDATA.Person”
End InputDataSource
4. Creating new variables using an expression or aggregation
The following InputDataSource section defines a metadata filter that specifies one aggregation and one expression that will create new variables in the data transformation:
InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = "SELECT sum(visits+visits12) as 'total visits', _
      name+address as person FROM HDATA"
End InputDataSource
5. Selecting a grid slice via a VDATA or HDATA query
The following InputDataSource section defines a metadata filter that specifies two grid slices in the data transformation:
InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = "SELECT Person[3].tvdays[channel_1].column as
      NewGridSlice1, Person[1].Age,
      Person[3].tvdays[channel_1].column FROM HDATA"
End InputDataSource
The first grid slice will be changed to a normal field after transformation (it has been renamed). The second grid slice will remain at its HDATA level structure after transformation (it has not been renamed).
6. Renaming with a metadata filter
Using metadata filtering, you can rename a variable, expression, aggregation, or gird slice.
InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = "SELECT ^.Pets, _
      ^.address as HomeAddress, _
      Name as ‘Person Name’, _
      Age+Weight as ‘PersonWeight’, _
      SUM(Age+Gender) FROM HDATA.Person"
End InputDataSource
The resulting output metadata is:
Household_Pets  (Auto renamed)
HomeAddress;
Person_Name      (Auto renamed)
PersonWeight
SUM_Age_Gender  (Auto renamed)
The system will automatically rename if any of the following conditions are met:
The user specified a new name, but it does not meet variable naming conventions (contains blank spaces, symbols, and so on). For example, Name as ‘Person Name’ would be automatically renamed because of the space between Person and Name.
The Select query contains a down-lev’ed variable, but the user did not explicitly rename the variable. For example, ^.Pets would be automatically renamed to <filename>_Pets (Household_Pets) because it was down-lev’ed one layer from the top level.
As another example, ^.^.Age would be automatically renamed to <filename>_<parent level name>_age (household_person_age) because it was down-leveled two layers from the top level and the person level.
The Select query contains an expression or aggregation, but does not explicitly rename the expression\aggregation. For example, SUM(Age+Gender) would be automatically renamed to SUM_Age_Gender_ based on metadata creation naming conventions.
Tips
Use the UNICOM Intelligence Professional Metadata Viewer to view the names of the variables. For more information, see Using the metadata viewer.
When you have a large number of variables to specify or they have long names, you may find it easier to set up the metadata filter in WinDMSRun. For more information, see WinDMSRun window: Variable Selection.
See also
Cleaning data in an ongoing study
Case data filter
Data Management Script (DMS) file