factor_nosort: Fast Factor Generation In icd: Tools for Working with ICD-9 and ICD-10 Codes, and Finding Comorbidities

Description

This function generates factors more quickly, without leveraging `fastmatch`. The speed increase with `fastmatch` for ICD-9 codes was about 33 using `Rcpp`, and a hashed matching algorithm.

Usage

 `1` ```factor_nosort(x, levels, labels = levels, exclude = NA) ```

Arguments

 `x` An object of atomic type `integer`, `numeric`, `character` or `logical`. `levels` An optional character vector of levels. Is coerced to the same type as `x`. By default, we compute the levels as `sort(unique.default(x))`. `labels` A set of labels used to rename the levels, if desired.

Details

`NaN`s are converted to `NA` when used on numeric values. Extracted from https://github.com/kevinushey/Kmisc.git

These feature from base R are missing: ```exclude = NA, ordered = is.ordered(x), nmax = NA```

I don't think there is any requirement for factor levels to be sorted in advance, especially not for ICD-9 codes where a simple alphanumeric sorting will likely be completely wrong.

Author(s)

Kevin Ushey, adapted by Jack Wasey

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12``` ```x <- c("z", "a", "123") icd:::factor_nosort(x) # should return a factor without modification x <- as.factor(x) identical(icd:::factor_nosort(x), x) # unless the levels change: icd:::factor_nosort(x, levels = c("a", "z")) # existing factor levels aren't re-ordered without also moving elements f <- factor(c("a", "b", "b", "c")) g <- icd:::factor_nosort(f, levels = c("a", "c", "b")) stopifnot(g[4] == "c") ```

icd documentation built on May 9, 2018, 9:04 a.m.