identifyMissing: A checkFunction for identifying miscoded missing values.

Description Usage Arguments Details Value See Also Examples

View source: R/identifyMissing.R

Description

A checkFunction to be called from check that identifies values that appear to be miscoded missing values.

Usage

1
identifyMissing(v, nMax = 10, ...)

Arguments

v

A variable to check.

nMax

The maximum number of problematic values to report. Default is 10. Set to Inf if all problematic values are to be included in the outputted message, or to 0 for no output.

...

Not in use.

Details

identifyMissing tries to identify common choices of missing values outside of the R standard (NA). These include special words (NaN and Inf (no matter the cases)), one or more -9/9's (e.g. 999, "99", -9, "-99"), one ore more -8/8's (e.g. -8, 888, -8888), Stata style missing values (commencing with ".") and other character strings ("", " ", "-", "NA" miscoded as character). If the variable is numeric/integer or a character/factor variable consisting only of numbers and with more than 11 different values, the numeric miscoded missing values (999, 888, -99, -8 etc.) are only recognized as miscoded missing if they are maximum or minimum, respectively, and the distance between the second largest/smallest value and this maximum/minimum value is greater than one.

Value

A checkResult with three entires: $problem (a logical indicating whether midcoded missing values where found), $message (a message describing which values in v were suspected to be miscoded missing values), and $problemValues (the problematic values in their original format). Note that Only unique problematic values are listed and that they are presented in alphabetical order.

See Also

check, allCheckFunctions, checkFunction, checkResult

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
##data(testData)
##testData$miscodedMissingVar
##identifyMissing(testData$miscodedMissingVar)

#Identify miscoded numeric missing values
v1 <- c(1:15, 99)
v2 <- c(v1, 98)
v3 <- c(-999, v2, 9999)
identifyMissing(v1)
identifyMissing(v2)
identifyMissing(v3)
identifyMissing(factor(v3))

ekstroem/cleanR documentation built on Jan. 31, 2022, 8:58 a.m.