| ctapply | R Documentation |
ctapply is a fast replacement of tapply that assumes
contiguous input, i.e. unique values in the index are never separated
by any other values. This avoids an expensive split step since
both value and the index chunks can be created on the fly. This
can make it orders of magnitude faster than the classical
lapply(split(), ...) implementation.
ctapply(X, INDEX, FUN, ..., MERGE=c)
X |
an atomic object, typically a vector |
INDEX |
numeric or character vector of the same length as |
FUN |
the function to be applied |
... |
additional arguments to |
MERGE |
function to merge the resulting vector or |
Note that ctapply supports either integer, real or character
vectors as indices (note that factors are integer vectors and thus
supported; you do not need to convert character vectors). Unlike
tapply it does not take a list of factors - if you want to use
a cross-product of factors, create the product first, e.g. using
paste(i1, i2, i3, sep='\01') or multiplication - whetever
method is convenient for the input types.
ctapply requires the INDEX to contiguous. One (slow) way
to achieve that is to use sort or order,
but in typical use-cases it is applied to already structured data
which is sharded, but does not need to be sorted.
ctapply also supports X to be a matrix in which case it
is split row-wise based on INDEX. The number of rows must match
the length of INDEX. Note that the indexed matrices behave as
if drop=FALSE was used and currently dimnames are only
honored if rownames are present.
If the output is multi-dimensional, you probably want to use
MERGE=rbind or MERGE=cbind instead of the default.
This function has been moved to the fastmatch package!
Simon Urbanek
tapply
# contiguous names = LETTERS with ~350k values each
l <- rep(LETTERS, rnorm(length(LETTERS), 350000, 10000))
# random values
i <- rnorm(length(l))
system.time(rt <- tapply(i, l, sum))
system.time(rc <- ctapply(i, l, sum))
## tapply always returns an array so compare the same structure
identical(rt, as.array(rc))
## ctapply() also works on matrices (unlike tapply)
m <- matrix(c("A","A","B","B","B","C","A","B","C","D","E","F","","X","X","Y","Y","Z"),,3)
ctapply(m, m[,1], identity, MERGE=list)
ctapply(m, m[,1], identity, MERGE=rbind)
m2 <- m[,-1]
rownames(m2) <- m[,1]
colnames(m2) <- c("V1","V2")
ctapply(m2, rownames(m2), identity, MERGE=list)
ctapply(m2, rownames(m2), identity, MERGE=rbind)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.