match.tbl: Match Table

Description Usage Arguments Details Value Warning Examples

View source: R/helperMisc.R

Description

Find (via potentially fuzzy matching) values in a table given a lookup reference

Usage

1
2
## S3 method for class 'tbl'
match(ref, tbl.ref, tbl.val, exact = FALSE)

Arguments

ref

a vector of reference values to lookup

tbl.ref

reference values in the table to be matched to ref

tbl.val

values to be retrieved from the table when found in association with tbl.ref

exact

logical, default FALSE; if TRUE, matches of ref to tbl.ref needn't be exact

Details

When exact=FALSE, match.tbl performs a match, with the added utility of returning a value in the table (rather than simply the index of the matches). When exact=TRUE, functions in cull are called to reformat ref and search for a match. If not match is found still, then fuzzy matching is performed via agrep on versions of ref that have and have not been formatted via cull.

The values of val.src indicate the amount of fuzziness involved in the match:

m1 an exact match
m2 exact match after cull
m3 fuzzy match performed on ref
m4 fuzzy match performed on cull(ref)

Fuzzy matching performed with agrep, with arguments ignore.case=TRUE, max.distance=0.25.

Value

A data.table with 4 columns:

[,1] ref class(ref) the reference value
[,2] val class(val) the value from the table (usually this is the desired output)
[,3] val.src character the source of the match
[,4] tbl.row integer Row of match in the table

Warning

I am suspicous that the values returned in tbl.ref may be inaccurate. However, this quality, and the function in general, has not been thoroughly tested. Although use-cases have given desirable results, albeit I think that the fuzzy matching can be a bit too fuzzy (finding matches where there shouldn't be any). Be aware.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
library(data.table)
tbl <- data.table(animal=c(
	"cats",
	"dogs",
	"elephant",
	"giraffe",
	"monkey",
	"person",
	"Gadus morhua",
	"Paralichthys dentatus",
	"Pomatomus saltatrix",
	"Amphiprioninae"
), a=1:10, b=10:1)
ref <- c(
	"GADUS MORHUA",
	"Amphiprion (the computer)",
	"elehpant",
	"dogs",
	"squirrel",
	"gaus"
)
tbl.ref <- tbl[,animal]
match.tbl(ref, tbl.ref, tbl[,animal]) # return what was matched to
match.tbl(ref, tbl.ref, tbl[,a]) # return another column

rBatt/trawlData documentation built on May 26, 2019, 7:45 p.m.