Description Usage Arguments Details Value See Also Examples
Similar in function to dcast
, but produces a sparse
Matrix
as an output. Sparse matrices are beneficial for this
application because such outputs are often very wide and sparse. Conceptually
similar to a pivot
operation.
1 2 3 4 5 6 7 8 9 |
data |
a data frame |
formula |
casting |
fun.aggregate |
name of aggregation function. Defaults to 'sum' |
value.var |
name of column that stores values to be aggregated numerics |
as.factors |
if TRUE, treat all columns as factors, including |
factor.nas |
if TRUE, treat factors with NAs as new levels. Otherwise, rows with NAs will receive zeroes in all columns for that factor |
drop.unused.levels |
should factors have unused levels dropped? Defaults to TRUE,
in contrast to |
Casting formulas are slightly different than those in dcast
and follow
the conventions of model.matrix
. See formula
for
details. Briefly, the left hand side of the ~
will be used as the
grouping criteria. This can either be a single variable, or a group of
variables linked using :
. The right hand side specifies what the
columns will be. Unlike dcast
, using the +
operator will append
the values for each variable as additional columns. This is useful for
things such as one-hot encoding. Using :
will combine the columns as
interactions.
a sparse Matrix
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | #Classic air quality example
melt<-function(data,idColumns)
{
cols<-setdiff(colnames(data),idColumns)
results<-lapply(cols,function (x) cbind(data[,idColumns],variable=x,value=as.numeric(data[,x])))
results<-Reduce(rbind,results)
}
names(airquality) <- tolower(names(airquality))
aqm <- melt(airquality, idColumns=c("month", "day"))
dMcast(aqm, month:day ~variable,fun.aggregate = 'mean',value.var='value')
dMcast(aqm, month ~ variable, fun.aggregate = 'mean',value.var='value')
#One hot encoding
#Preserving numerics
dMcast(warpbreaks,~.)
#Pivoting numerics as well
dMcast(warpbreaks,~.,as.factors=TRUE)
## Not run:
orders<-data.frame(orderNum=as.factor(sample(1e6, 1e7, TRUE)),
sku=as.factor(sample(1e3, 1e7, TRUE)),
customer=as.factor(sample(1e4,1e7,TRUE)),
state = sample(letters, 1e7, TRUE),
amount=runif(1e7))
# For simple aggregations resulting in small tables, dcast.data.table (and
reshape2) will be faster
system.time(a<-dcast.data.table(as.data.table(orders),sku~state,sum,
value.var = 'amount')) # .5 seconds
system.time(b<-reshape2::dcast(orders,sku~state,sum,
value.var = 'amount')) # 2.61 seconds
system.time(c<-dMcast(orders,sku~state,
value.var = 'amount')) # 8.66 seconds
# However, this situation changes as the result set becomes larger
system.time(a<-dcast.data.table(as.data.table(orders),customer~sku,sum,
value.var = 'amount')) # 4.4 seconds
system.time(b<-reshape2::dcast(orders,customer~sku,sum,
value.var = 'amount')) # 34.7 seconds
system.time(c<-dMcast(orders,customer~sku,
value.var = 'amount')) # 14.55 seconds
# More complicated:
system.time(a<-dcast.data.table(as.data.table(orders),customer~sku+state,sum,
value.var = 'amount')) # 16.96 seconds, object size = 2084 Mb
system.time(b<-reshape2::dcast(orders,customer~sku+state,sum,
value.var = 'amount')) # Does not return
system.time(c<-dMcast(orders,customer~sku:state,
value.var = 'amount')) # 21.53 seconds, object size = 116.1 Mb
system.time(a<-dcast.data.table(as.data.table(orders),orderNum~sku,sum,
value.var = 'amount')) # Does not return
system.time(c<-dMcast(orders,orderNum~sku,
value.var = 'amount')) # 24.83 seconds, object size = 175Mb
system.time(c<-dMcast(orders,sku:state~customer,
value.var = 'amount')) # 17.97 seconds, object size = 175Mb
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.