lookup: Hash Table/Dictionary Lookup 'lookup' - 'data.table' based...

View source: R/lookup.R

lookupR Documentation

Hash Table/Dictionary Lookup lookup - data.table based hash table useful for large vector lookups.

Description

%l% - A binary operator version of lookup for when key.match is a data.frame or named list.

%l+% - A binary operator version of lookup for when key.match is a data.frame or named list and missing is assumed to be NULL.

%lc% - A binary operator version of lookup for when key.match is a data.frame or named list and all arguments are converted to character.

%lc+% - A binary operator version of lookup for when key.match is a data.frame or named list, missing is assumed to be NULL, and all arguments are converted to character.

Usage

lookup(terms, key.match, key.reassign = NULL, missing = NA)

## S3 method for class 'list'
lookup(terms, key.match, key.reassign = NULL, missing = NA)

## S3 method for class 'data.frame'
lookup(terms, key.match, key.reassign = NULL, missing = NA)

## S3 method for class 'matrix'
lookup(terms, key.match, key.reassign = NULL, missing = NA)

## S3 method for class 'numeric'
lookup(terms, key.match, key.reassign, missing = NA)

## S3 method for class 'factor'
lookup(terms, key.match, key.reassign, missing = NA)

## S3 method for class 'character'
lookup(terms, key.match, key.reassign, missing = NA)

terms %l% key.match

terms %l+% key.match

terms %lc% key.match

terms %lc+% key.match

Arguments

terms

A vector of terms to undergo a lookup.

key.match

Takes one of the following: (1) a two column data.frame of a match key and reassignment column, (2) a named list of vectors (Note: if data.frame or named list supplied no key reassign needed) or (3) a single vector match key.

key.reassign

A single reassignment vector supplied if key.match is not a two column data.frame/named list.

missing

Value to assign to terms not matching the key.match. If set to NULL the original values in terms corresponding to the missing elements are retained.

Value

Outputs A new vector with reassigned values.

See Also

setDT, hash

Examples

## Supply a dataframe to key.match

lookup(1:5, data.frame(1:4, 11:14))

## Retain original values for missing 
lookup(1:5, data.frame(1:4, 11:14), missing=NULL) 

lookup(LETTERS[1:5], data.frame(LETTERS[1:5], 100:104))
lookup(LETTERS[1:5], factor(LETTERS[1:5]), 100:104)

## Supply a named list of vectors to key.match

codes <- list(
    A = c(1, 2, 4), 
    B = c(3, 5),
    C = 7,
    D = c(6, 8:10)
)

lookup(1:10, codes)

## Supply a single vector to key.match and key.reassign

lookup(mtcars$carb, sort(unique(mtcars$carb)),        
    c("one", "two", "three", "four", "six", "eight")) 
    
lookup(mtcars$carb, sort(unique(mtcars$carb)),        
    seq(10, 60, by=10))
  
## %l%, a binary operator version of lookup
1:5 %l% data.frame(1:4, 11:14)
1:10 %l% codes

1:12 %l% codes
1:12 %l+% codes
  
(key <- data.frame(a=1:3, b=factor(paste0("l", 1:3))))
1:3 %l% key

##Larger Examples
key <- data.frame(x=1:2, y=c("A", "B"))
big.vec <- sample(1:2, 3000000, TRUE)
out <- lookup(big.vec, key)
out[1:20]

## A big string to recode with variation
## means a bigger dictionary
recode_me <- sample(1:(length(LETTERS)*10), 10000000, TRUE)

## Time it
tic <- Sys.time()  

output <- recode_me %l% split(1:(length(LETTERS)*10), LETTERS)
difftime(Sys.time(), tic)

## view it
sample(output, 100)

qdapTools documentation built on May 31, 2023, 7:01 p.m.