View source: R/match.data.frame.R
| match.data.frame | R Documentation |
y best
matching each row of x
For each row of x[, by.x],
find the best matching row of
y[, by.y], with the best
match defined by grep. and
split.
grep. and split must
either be missing or
have the same length as by.x
and by.y. If grep.[i]
and split[i] are NA, do a
complete match of x[, by.x[i]]
and y[, by.y[i]]. Otherwise,
for each row j, look for a
match for strsplit(x[j, by.x[i]],
split[i])[[1]][1] among
strsplit(y[, by.y[i]], split[i]).
See details.
match.data.frame(x, y, by, by.x=by, by.y=by,
grep., split, sep=':')
x, y |
data.frames |
by, by.x, by.y |
names of columns of |
grep. |
a character vector of the type of match
for each element of Alternatives are NOTE: These alternatives are not examined
if a unique match is found between
|
split |
A character vector of |
sep |
a |
1. Check by.x, by.y, grep. and
split. If((missing(by.x) |
missing(by.y)) && missing(by)) by <- names(x)
2. fullMatch <- (is.na(grep.) & is
.na(split)). Create keyfx and
keyfy by by pasting columns of
x[, by.x[fullMatch]] and
y[, by.y[fullMatch]]. Also
create x. and y. =
strsplit of
x[, by.x[!fullMatch]].
3. Iterate over rows of x looking
for the best match. This includes an inner
loop over columns of
x[, by.x[!fullMatch]], stopping
on the first unique match. Return (-1) if
no unique match is found.
an integer vector of length nrow(x)
containing the index of the best matching row
of y or NA if no adequate match
was found.
Spencer Graves
strsplit, is.na
grep, agrep
match, row.match,
join, match_df
classify
newdata <- data.frame(state=c("AL", "MI","NY"),
surname=c("Rogers", "Rogers", "Smith"),
givenName=c("Mike R.", "Mike K.", "Al"),
stringsAsFactors=FALSE)
reference <- data.frame(state=c("NY", "NY", "MI", "AL", "NY", "MI"),
surname=c("Smith", "Rogers", "Rogers (MI)",
"Rogers (AL)", "Smith", 'Jones'),
givenName=c("John", "Mike", "Mike", "Mike",
"T. Albert", 'Al Thomas'),
stringsAsFactors=FALSE)
newInRef <- match.data.frame(newdata, reference,
grep.=c(NA, 'agrep', 'agrep'))
all.equal(newInRef, c(4, 3, 5))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.