factor_: Fast Factor Generation

Description Usage Arguments Details Author(s) Examples

View source: R/util.R

Description

This function generates factors more quickly, by leveraging fastmatch::fmatch. The speed increase for ICD-9 codes is about 33

Usage

1
2
3

Arguments

x

An object of atomic type integer, numeric, character or logical.

levels

An optional character vector of levels. Is coerced to the same type as x. By default, we compute the levels as sort(unique.default(x)).

labels

A set of labels used to rename the levels, if desired.

na.last

If TRUE and there are missing values, the last level is set as NA; otherwise; they are removed.

Details

NaNs are converted to NA when used on numerics. Extracted from https://github.com/kevinushey/Kmisc.git

These feature from base R are missing: exclude = NA, ordered = is.ordered(x), nmax = NA

I don't think there is any requirement for factor levels to be sorted in advance, especially not for ICD-9 codes where a simple alphanumeric sorting will likely be completely wrong.

Author(s)

Kevin Ushey, adapted by Jack Wasey

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Not run: 
pts <- icd9:::randomUnorderedPatients(1e7)
u <- unique.default(pts$icd9)
# this shows that stringr (which uses stringi) sort takes 50% longer than
# built-in R sort.
microbenchmark::microbenchmark(sort(u), stringr::str_sort(u))

# this shows that \code{factor_} is about 50% faster than \code{factor} for
# big vectors of strings

# without sorting is much faster:
microbenchmark::microbenchmark(factor(pts$icd9),
                               factor_(pts$icd9),
                               factor_nosort(pts$icd9),
                               times=25)

## End(Not run)

jackwasey/icd9 documentation built on May 18, 2019, 7:57 a.m.