identifyLoners: A checkFunction for identifying sparsely represented values...

Description Usage Arguments Details Value See Also Examples

View source: R/identifyLoners.R


A checkFunction to be called from check that identifies values that only occur less than 6 times in factor, (haven_)labelled, or character variables (that is, loners).


identifyLoners(v, nMax = 10)



A character, (haven_)labelled, or factor variable to check.


The maximum number of problematic values to report. Default is 10. Set to Inf if all problematic values are to be included in the outputted message, or to 0 for no output.


For character, (haven_)labelled, and factor variables, identify values that only have a very low number of observations, as these categories might be problematic when conducting an analysis. Unused factor levels are not considered "loners". "Loners" are defined as values with 5 or less observations, reflecting the commonly use rule of thumb for performing chi squared tests.


A checkResult with three entires: $problem (a logical indicating whether case issues where found), $message (a message describing which values in v were loners) and $problemValues (the problematic values in their original format). Note that Only unique problematic values are listed and they are presented in alphabetical order.

See Also

check, allCheckFunctions, checkFunction, checkResult


identifyLoners(c(rep(c("a", "b", "c"), 10), "d", "d"))

dataMaid documentation built on Oct. 8, 2021, 9:08 a.m.