factor_nosort: Fast Factor Generation

Description Usage Arguments Details Author(s) Examples

Description

This function generates factors more quickly, without leveraging `fastmatch`. The speed increase with `fastmatch` for ICD-9 codes was about 33 using `Rcpp`, and a hashed matching algorithm.

Usage

 `1` ```factor_nosort(x, levels = NULL, labels = levels) ```

Arguments

 `x` An object of atomic type `integer`, `numeric`, `character` or `logical`. `levels` An optional character vector of levels. Is coerced to the same type as `x`. By default, we compute the levels as `sort(unique.default(x))`. `labels` A set of labels used to rename the levels, if desired. `na.last` If `TRUE` and there are missing values, the last level is set as `NA`; otherwise; they are removed.

Details

`NaN`s are converted to `NA` when used on numeric values. Extracted from https://github.com/kevinushey/Kmisc.git

These feature from base R are missing: ```exclude = NA, ordered = is.ordered(x), nmax = NA```

I don't think there is any requirement for factor levels to be sorted in advance, especially not for ICD-9 codes where a simple alphanumeric sorting will likely be completely wrong.

Author(s)

Kevin Ushey, adapted by Jack Wasey

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17``` ```## Not run: pts <- icd:::random_unordered_patients(1e7) u <- unique.default(pts\$code) # this shows that stringr (which uses stringi) sort takes 50% longer than # built-in R sort. microbenchmark::microbenchmark(sort(u), str_sort(u)) # this shows that \code{factor_} is about 50% faster than \code{factor} for # big vectors of strings # without sorting is much faster: microbenchmark::microbenchmark(factor(pts\$code), # factor_(pts\$code), factor_nosort(pts\$code), times = 25) ## End(Not run) ```

Search within the icd package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.