Data editing > Data correction > Forced editing (forced cleaning)
 
Forced editing (forced cleaning)
This section describes how to combine the statements that you already know in order to clean your data. It does not introduce any new keywords.
A record which generates too many error messages, or which is clearly incorrect can be removed, as noted. For example, if its serial number is 2004, you have:
if (c(101,104)=$2004$) reject; return
This rejects the record from the rest of the edit and the tabulation section as well.
This statement should be at the beginning of the edit to avoid unnecessary editing of a useless record.
Columns within a record can be removed by blanking them out or setting them to a common reject code, often a minus or ampersand.
For example:
if(c125n'12') c125'&'; c(126,145)=$ $
All records in which c125 contains neither a 1 or a 2 will have the contents of that column replaced with an ampersand, and whatever is in c(126,145) blanked out. As a real-life example, suppose a 1 in c125 means that the respondent visited the market, and a 2 in that column means they did not. Information about purchases made at the market are stored in c(126,145). If column 125 contains neither a 1 or a 2, you cannot clearly establish whether or not the respondent visited the market so set c125 to a special code and blank out any information about purchases.
Inserting correct data is generally more difficult than removing invalid data, because you often do not know what the correct data is. However, if you do know, you can correct the data record by record, or make the same correction for any record which is incorrect. For example:
if(c(101,104)=$2222$) c112'2'; c(113,114)=$ $
corrects the record whose serial number is 2222 by setting a 2 into c112 and blanking out c(113,114).
If you do not know what the correct data is, you can replace the incorrect code or codes with a valid code chosen at random. For example:
if (c(101,104)=$3625$) c145=rpunch('1/5')
replaces whatever was in column 145 with one of the codes 1 through 5 for the record whose serial number is 3625.
Note When correcting data on a record-by-record basis, it is more convenient to use the methods outlined below.
See also
Data correction