Description Usage Arguments Details Value Author(s) See Also Examples
View source: R/match.data.frame.R
For each row of x[, by.x]
, find the best matching row of
y[, by.y]
, with the best match defined by grep.
and
split
.
grep.
and split
must either be missing
or
have the same length as by.x
and by.y
. If
grep.[i]
and split[i]
are NA, do a complete match of
x[, by.x[i]]
and y[, by.y[i]]
. Otherwise, for each row
j
, look for a match for strsplit(x[j, by.x[i]],
split[i])[[1]][1]
among strsplit(y[, by.y[i]], split[i])
.
See details.
1 | match.data.frame(x, y, by, by.x=by, by.y=by, grep., split, sep=':')
|
x, y |
data.frames |
by, by.x, by.y |
names of columns of |
grep. |
a character vector of the type of match for each element of
Alternatives are NOTE: These alternatives are not examined if a unique match is
found betweed x[, by.x[is.na(grep.) & is.na(split)]] and the
corresponding columns of |
split |
A character vector of |
sep |
a |
1. Check by.x, by.y, grep. and split. If((missing(by.x) | missing(by.y)) && missing(by)) by <- names(x)
2. fullMatch <- (is.na(grep.) & is.na(split)). Create keyfx and
keyfy by by pasting columns of x[, by.x[fullMatch]] and y[,
by.y[fullMatch]]. Also create x. and y. = strsplit
of
x[, by.x[!fullMatch]].
3. Iterate over rows of x
looking for the best match. This
includes an inner loop over columns of x[, by.x[!fullMatch]], stopping
on the first unique match. Return (-1) if no unique match is found.
an integer vector of length nrow(x) containing the index of the best
matching row of y
or NA if no adequate match was found.
Spencer Graves
strsplit
, is.na
grep
, agrep
match
, row.match
,
join
, match_df
classify
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | newdata <- data.frame(state=c("AL", "MI","NY"),
surname=c("Rogers", "Rogers", "Smith"),
givenName=c("Mike R.", "Mike K.", "Al"),
stringsAsFactors=FALSE)
reference <- data.frame(state=c("NY", "NY", "MI", "AL", "NY", "MI"),
surname=c("Smith", "Rogers", "Rogers (MI)",
"Rogers (AL)", "Smith", 'Jones'),
givenName=c("John", "Mike", "Mike", "Mike",
"T. Albert", 'Al Thomas'),
stringsAsFactors=FALSE)
newInRef <- match.data.frame(newdata, reference,
grep.=c(NA, 'agrep', 'agrep'))
all.equal(newInRef, c(4, 3, 5))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.