Description Usage Arguments Details Examples
Returns a lookup table or list of the positions of ALL matches of its first
argument in its second and vice versa. Similar to match, though
that function only returns the first match.
1 2 |
x |
vector. The values to be matched. Long vectors are not currently supported. |
y |
vector. The values to be matched. Long vectors are not currently supported. |
all.x |
logical; if |
all.y |
logical; if |
list |
logical. If |
indexes |
logical. Whether to return the indices of the matches or the actual values. |
nomatch |
the value to be returned in the case when no match is found.
If not provided and |
This behavior can be imitated by using joins to create lookup tables, but
matches is simpler and faster: usually faster than the best joins in
other packages and thousands of times faster than the built in
merge.
all.x/all.y correspond to the four types of database joins in the
following way:
all.x=TRUE, all.y=FALSE
all.x=FALSE, all.y=TRUE
all.x=FALSE, all.y=FALSE
all.x=TRUE, all.y=TRUE
Note that NA values will match other NA values.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 | one<-as.integer(1:10000)
two<-as.integer(sample(1:10000,1e3,TRUE))
system.time(a<-lapply(one, function (x) which(two %in% x)))
system.time(b<-matches(one,two,all.y=FALSE,list=TRUE))
#Only retain items from one with a match in two
b<-matches(one,two,all.x=FALSE,all.y=FALSE,list=TRUE)
length(b)==length(unique(two))
one<-round(runif(1e3),3)
two<-round(runif(1e3),3)
system.time(a<-lapply(one, function (x) which(two %in% x)))
system.time(b<-matches(one,two,all.y=FALSE,list=TRUE))
one<-as.character(1:1e5)
two<-as.character(sample(1:1e5,1e5,TRUE))
system.time(b<-matches(one,two,list=FALSE))
system.time(c<-merge(data.frame(key=one),data.frame(key=two),all=TRUE))
## Not run:
one<-as.integer(1:1000000)
two<-as.integer(sample(1:1000000,1e5,TRUE))
system.time(b<-matches(one,two,indexes=FALSE))
if(requireNamespace("dplyr",quietly=TRUE))
system.time(c<-dplyr::full_join(data.frame(key=one),data.frame(key=two)))
if(require(data.table,quietly=TRUE))
system.time(d<-merge(data.table(data.frame(key=one))
,data.table(data.frame(key=two))
,by='key',all=TRUE,allow.cartesian=TRUE))
one<-as.character(1:1000000)
two<-as.character(sample(1:1000000,1e5,TRUE))
system.time(a<-merge(one,two)) #Times out
system.time(b<-matches(one,two,indexes=FALSE))
if(requireNamespace("dplyr",quietly=TRUE))
system.time(c<-dplyr::full_join(data.frame(key=one),data.frame(key=two)))#'
if(require(data.table,quietly=TRUE))
{
system.time(d<-merge(data.table(data.frame(key=one))
,data.table(data.frame(key=two))
,by='key',all=TRUE,allow.cartesian=TRUE))
identical(b[,1],as.character(d$key))
}
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.