Professional > Data management scripting > Data cleaning > Data cleaning examples > Example 6: Advanced cleaning example
 
Example 6: Advanced cleaning example
This example provides an example of using advanced mrScriptBasic features to create a general cleaning script for all of the questions in a survey. It builds on the first example in Example 3: More on cleaning single response data, but handles questions of all types.
The most important code in this example is in the CleanQuestion Sub procedure, which determines the question type and then takes the appropriate action. This subroutine starts with the following line:
If (Q.QuestionType <> QuestionTypes.qtSimple) Then
This tests whether the question is a complex question like a block or a grid, and if it is, performs another For Each loop on the Questions object that corresponds to the questions nested within the block or grid. It is important that you always include a test of this type when you are using a For Each construction with the Questions collection or you risk getting an error.
All of the subroutines take the dmgrJob object as the first argument. This is useful because the dmgrJob object gives you access to many other objects that you need, such as the GlobalVariables collection. Also, this example provides an example of calling the Err.Raise method.
Event(OnJobStart)
Dim fso, out
Set fso = CreateObject("Scripting.FileSystemObject")
Set out = fso.CreateTextFile("C:\Program Files\IBM\SPSS\DataCollection\7\DDL\Output\AdvancedCleaning.txt", True, True)
dmgrGlobal.Add("Rep")
Set dmgrGlobal.Rep = out
dmgrGlobal.Rep.WriteLine("Advanced cleaning script started successfully")
End Event

' =======================================================================

Event(OnNextCase)

CleanAllQuestions(dmgrJob, Respondent.Serial)

' -------------------------------------------------------------------

Sub CleanAllQuestions(dmgrJob, RSerial)
Dim Question
For Each Question in dmgrJob.Questions
Select Case Question.QuestionFullName
Case "Respondent", "DataCollection", "DataCleaning"
' Ignore system variables
Case Else
CleanQuestion(dmgrJob, Question, RSerial)
End Select
Next
End Sub

' --------------------------------------------------------------------

Sub CleanQuestion(dmgrJob, Q, RSerial)
If (Q.QuestionType <> QuestionTypes.qtSimple) Then
' The Question is a complex question (collection)
Dim Question
For Each Question in Q
CleanQuestion(dmgrJob, Question, RSerial)
Next
Else
' For a simple question, figure out the type and call the relevant procedure.
Select Case Q.Response.DataType
Case mr.None
' No response
CheckNone(dmgrJob, Q, RSerial)
Case mr.Double, mr.Long
' Numeric response
CheckNumeric(dmgrJob, Q, RSerial)
Case mr.Text
' Text response
CheckText(dmgrJob, Q, RSerial)
Case mr.Categorical
' Categorical response
CheckCategorical(dmgrJob, Q, RSerial)
Case mr.Date
' Date response
CheckDate(dmgrJob, Q, RSerial)
Case mr.Boolean
' Boolean response
Case Else
Err.Raise(1, , "Unexpected response type encountered (type=" _
+ CText(Q.Response.DataType) + ")")
End Select
End If
End Sub

' --------------------------------------------------------------------

' Check 'unanswered' responses

Sub CheckNone(dmgrJob, Q, RSerial)
'! Insert checks for unanswered variables here ...

For example, you could add the following lines:

If Q.Validation.MinValue > 0 Or Q.Validation.MaxValue < 0 Then
MarkInError(dmgrJob, Q, "should not be blank", RSerial)
End If

However, these lines do not take into account that in some surveys
some respondents are routed around some of the questions. This is
true in the Museum survey, and so these lines are not used in this
example. !'
End Sub

' -------------------------------------------------------------------

' Check numeric responses

Sub CheckNumeric(dmgrJob, Q, RSerial)
' Insert checks for numeric variables here ...
If Q.Response.Value IS NOT NULL Then
If Q.Response.Value < Q.Validation.MinValue Or _
Q.Response.Value > Q.Validation.MaxValue Then
MarkInError(dmgrJob, Q, "is <" + CText(Q.Validation.MinValue) + _
" or >" + CText(Q.Validation.MaxValue), RSerial)
End If
Else
CheckNone(dmgrJob, Q, RSerial)
End If
End Sub

' --------------------------------------------------------------------

' Check text responses

Sub CheckText(dmgrJob, Q, RSerial)
' Insert checks for text variables here ...
End Sub

' --------------------------------------------------------------------------

' Check categorical responses

Sub CheckCategorical(dmgrJob, Q, RSerial)
Dim C
' Insert checks for categorical variables here ...

' Multiple responses given
If Q.AnswerCount() > 1 Then
For Each C in Q.Categories
If BitAnd(C.Attributes, CategoryAttributes.caExclusive) And Q >= C Then
MarkInError(dmgrJob, Q, "- response """ + C.FullName + """ is not exclusive", RSerial)
End If
Next
End If
If Q.AnswerCount() > Q.Validation.MaxValue Then
MarkInError(dmgrJob, Q, "has >" + CText(Q.Validation.MaxValue) + " responses", RSerial)
End If
End Sub

' --------------------------------------------------------------------

Sub CheckDate(dmgrJob, Q, RSerial)
' Insert checks for date variables here ...
End Sub

' --------------------------------------------------------------------

' Mark the current case in error

Sub MarkInError(dmgrJob, Q, msg, RSerial)
Dim txt
txt = CText(RSerial) + ": " _
+ msg + " (" + Q.QuestionFullName + ")"
dmgrJob.GlobalVariables.Rep.WriteLine(txt)
End Sub

End Event

' =========================================================================

InputDataSource(Input)
ConnectionString = "Provider=mrOleDB.Provider.2; _
Data Source=mrDataFileDsc; _
Location=C:\Program Files\IBM\SPSS\DataCollection\7\DDL\Data\Data Collection File\museum.ddf; _
Initial Catalog=C:\Program Files\IBM\SPSS\DataCollection\7\DDL\Data\Data Collection File\museum.mdd"
End InputDataSource

' =========================================================================

OutputDataSource(Output)
ConnectionString = "Provider=mrOleDB.Provider.2; _
Data Source=mrDataFileDsc; _
Location=C:\Program Files\IBM\SPSS\DataCollection\7\DDL\Output\AdvancedCleaning.ddf"
MetaDataOutputName = "C:\Program Files\IBM\SPSS\DataCollection\7\DDL\Output\AdvancedCleaning.mdd"
End OutputDataSource

' ========================================================================
Note This example is provided as a sample DMS file (called AdvancedCleaning.dms). For more information, see Sample DMS files.
See also
Data cleaning examples