assignMissing: Set missing values

View source: R/variableKey.R

assignMissingR Documentation

Set missing values

Description

The missings values have to be carefully written, depending on the type of variable that is being processed.

Usage

assignMissing(x, missings = NULL, sep = ";")

Arguments

x

A variable

missings

A string vector of semi-colon separated values, ranges, and/or inequalities. For strings and factors, only an enumeration of values (or factor levels) to be excluded is allowed. For numeric variables (integers or floating point variables), one can specify open and double-sided intervals as well as particular values to be marked as missing. One can append particular values and ranges by "1;2;3;(8,10);[22,24];> 99;< 2". The double-sided interval is represented in the usual mathematical way, where hard bracketes indicate "closed" intervals and parentheses indicate open intervals.

  1. "(a,b)" means values of x greater than a and smaller than b will be set as missing.

  2. "[a,b]" is a closed interval, one which includes the endpoints, so a <= x <= b will be set as NA

  3. "(a,b]" and "[a,b)" are acceptable.

  4. "< a" indicates all values smaller than a will be missing

  5. "<= a" means values smaller than or equal to a will be excluded

  6. "> a" and ">= a" have comparable interpretations.

  7. "8;9;10" Mark off specific values by an enumeration. Be aware, however, that this is useful only for integer variables. As demonstrated in the example, for floating point numbers, one must specify intervals.

  8. For factors and character variables, the argument missings can be written either as "lo;med;hi" or "c('lo','med','hi')"

sep

A separator symbol, ";" (semicolon) by default

Details

Version 0.95 of kutils introduced a new style for specification of missing values.

Value

A cleaned column in which R's NA symbol replaces values that should be missing

Author(s)

Paul Johnson <pauljohn@ku.edu>

Examples

## 1.  Integers.
x <- seq.int(-2L, 22L, by = 2L)
## Exclude scores 8, 10, 18
assignMissing(x, "8;10;18")
## Specify range, 4 to 12 inclusive
missings <- "[4,12]"
assignMissing(x, missings)
## Not inclusive
assignMissing(x,  "(4,12)")
## Set missing for any value smaller that 7
assignMissing(x, "< 7")
assignMissing(x, "<= 8")
assignMissing(x, "> 11")
assignMissing(x, "< -1;2;4;(7, 9);> 20")


## 2. strings
x <- c("low", "low", "med", "high")
missings <- "low;high"
assignMissing(x, missings)
missings <- "med;doesnot exist"
assignMissing(x, missings)
## Test alternate separator
assignMissing(x, "low|med", sep = "|")

## 3. factors (same as strings, really)
x <- factor(c("low", "low", "med", "high"), levels = c("low", "med", "high"))
missings <- "low;high"
assignMissing(x, missings)
## Previous same as
missings <- c("low", "high")
assignMissing(x, missings)

missings <- c("med", "doesnot exist")
assignMissing(x, missings)
## ordered factor:
x <- ordered(c("low", "low", "med", "high"), levels = c("low", "med", "high"))
missings <- c("low", "high")
assignMissing(x, missings)

## 4. Real-valued variable
set.seed(234234)
x <- rnorm(10)
x
missings <- "< 0"
assignMissing(x, missings)
missings <- "> -0.2"
assignMissing(x, missings)
## values above 0.1 and below 0.7 are missing
missings <- "(0.1,0.7)"
assignMissing(x, missings)
## Note that in floating point numbers, it is probably
## futile to specify specific values for missings. Even if we
## type out values to 7 decimals, nothing gets excluded
assignMissing(x, "-0.4879708;0.1435791")
## Can mark a range, however
assignMissing(x, "(-0.487971,-0.487970);(0.14357, 0.14358)")
x

kutils documentation built on Sept. 17, 2023, 5:06 p.m.