Filtering data in a DMS file

Filters enable you to perform a data transformation on a subset of the data in a data source. For example, when you transfer data you might often want to transfer some, but not all, of the data. This topic describes some typical scenarios.

When you export data to a particular format for analysis (such as IBM SPSS Statistics .sav or Quantum .dat), you might want to exclude from the export all of the variables that will not be useful to the analysis that you are planning. You might also want to aggregate existing variables to a new variable (helpful when performing analysis). You can achieve this by defining a metadata filter that defines the variables to include, or aggregate, in the export.

In the DMS file, you define a metadata filter by specifying in the select query defined for the input data source the variables that you want to include or aggregate.

Since version 5.6, the Data Management Object Model (DMOM) checks for duplicate select variables in query strings. As a result, the following example results in the error “Duplicate select variable: Age” when transformation starts:

The following InputDataSource section defines a metadata filter that specifies seven variables that will be included in the data transformation:

InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = "SELECT age, gender, education, _
      interest, rating, remember, signs FROM VDATA"
End InputDataSource

The following InputDataSource section defines a metadata filter that specifies three variables, from the Person level, that will be included in the data transformation:

InputDataSource(myInputDataSource, "My input data source")
ConnectionString = ...
SelectQuery = " SELECT Name, Age, Trip FROM HDATA.Person"
End InputDataSource

The following InputDataSource section defines a metadata filter that specifies one variable from the parent level, one variable from current level, and one variable from the children level, that will be included in the data transformation:

InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = "SELECT ^.address, Name, _
      Trip.(Country) FROM HDATA.Person”
End InputDataSource

The following InputDataSource section defines a metadata filter that specifies one aggregation and one expression that will create new variables in the data transformation:

InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = "SELECT sum(visits+visits12) as 'total visits', _
      name+address as person FROM HDATA"
End InputDataSource

The following InputDataSource section defines a metadata filter that specifies two grid slices in the data transformation:

InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = "SELECT Person[3].tvdays[channel_1].column as
      NewGridSlice1, Person[1].Age,
      Person[3].tvdays[channel_1].column FROM HDATA"
End InputDataSource

The first grid slice will be changed to a normal field after transformation (it has been renamed). The second grid slice will remain at its HDATA level structure after transformation (it has not been renamed).

Using metadata filtering, you can rename a variable, expression, aggregation, or gird slice.

InputDataSource(myInputDataSource, "My input data source")
  ConnectionString = ...
  SelectQuery = "SELECT ^.Pets, _
      ^.address as HomeAddress, _
      Name as ‘Person Name’, _
      Age+Weight as ‘PersonWeight’, _
      SUM(Age+Gender) FROM HDATA.Person"
End InputDataSource

The resulting output metadata is:

Household_Pets  (Auto renamed)
HomeAddress;
Person_Name     (Auto renamed)
PersonWeight
SUM_Age_Gender  (Auto renamed)

The system will automatically rename if any of the following conditions are met:

▪The user specified a new name, but it does not meet variable naming conventions (contains blank spaces, symbols, and so on). For example, Name as ‘Person Name’ would be automatically renamed because of the space between Person and Name.

▪The Select query contains a down-lev’ed variable, but the user did not explicitly rename the variable. For example, ^.Pets would be automatically renamed to <filename>_Pets (Household_Pets) because it was down-lev’ed one layer from the top level.

As another example, ^.^.Age would be automatically renamed to <filename>_<parent level name>_age (household_person_age) because it was down-leveled two layers from the top level and the person level.

▪The Select query contains an expression or aggregation, but does not explicitly rename the expression\aggregation. For example, SUM(Age+Gender) would be automatically renamed to SUM_Age_Gender_ based on metadata creation naming conventions.

▪Use the UNICOM Intelligence Professional Metadata Viewer to view the names of the variables. For more information, see Using the metadata viewer.

▪When you have a large number of variables to specify or they have long names, you might find it easier to set up the metadata filter in WinDMSRun. For more information, see WinDMSRun window: Variable Selection.