Description Usage Arguments Details Value Note See Also Examples
A dcast.data.table
is a much faster version of reshape2:::dcast
, but for data.table. More importantly, it's able to handle bigger data quite efficiently without compromising speed. This is still under development, meaning it's stable, but not all features are complete yet. Once complete, we plan to make it an S3 generic by making changes to reshape2:::dcast
. Then by loading both data.table
and reshape2
, one could use dcast
on data.table
as one would on a data.frame
.
If you think of a particular feature that might be useful, then file a feature request (FR) at the datatable projects page (link at the bottom).
1 2 3 4 5 |
data |
A molten |
formula |
A formula of the form LHS ~ RHS to cast, see details. |
fun.aggregate |
Should the data be aggregated before casting? If the formula doesn't identify single observation for each cell, then aggregation defaults to |
... |
Any other arguments that maybe passed to the aggregating function. |
margins |
Not implemented yet. Should take variable names to compute margins on. A value of |
subset |
Specified if casting should be done on subset of the data. Ex: subset = .(col1 <= 5) or subset = .(variable != "January"). |
fill |
Value to fill missing cells with. If |
drop |
|
value.var |
Name of the column whose values will be filled to cast. Function 'guess()' tries to, well, guess this column automatically, if none is provided. |
verbose |
Not used yet. Maybe dropped in the future or used to provide information messages onto the console. |
The cast formula takes the form LHS ~ RHS
, ex: var1 + var2 ~ var3
. The order of entries in the formula is essential. There are two special variables: .
and ...
. Their functionality is identical to that of reshape2:::dcast
.
dcast.data.table
also allows value.var
columns of type list
.
When the combination of variables in formula
doesn't identify a unique value in a cell, fun.aggregate
will have to be used. The aggregating function should take a vector as input and return a single value (or a list of length one) as output. In cases where value.var
is a list, the function should be able to handle a list input and provide a single value or list of length one as output.
If the formula's LHS contains the same column more than once, ex: dcast.data.table(DT, x+x~ y)
, then the answer will have duplicate names. In those cases, the duplicate names are renamed using make.unique
so that the key can be set without issues.
Names for columns that are being cast are generated in the same order (separated by a _
) from the (unique) values in each column mentioned in the formula RHS.
The only feature that's not implemented from reshape2:::dcast
*yet* is the argument margins
.
A keyed data.table
that has been cast. The key columns are equal to the variables in the formula
LHS in the same order.
Currently unimplemented features from reshape2:::dcast
:
At the moment, only formula interface is supported. That the argument formula
can accept a list of quoted expressions is yet to be implemented.
The argument margins
is yet to be implemented.
New features and the ones likely to be added:
Implemented: dcast.data.table
allows value.var
column to be of type list
.
To do: The fun.aggregate
could accept more than one function. It could take a list of functions, all of which satisfy the criteria that they take a vector as input and provide a single value as output. This is very useful for scenarios where one needs to calculate multiple summary statistics on a data set and cast simultaneously.
melt.data.table
, https://r-forge.r-project.org/projects/datatable/
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | require(data.table)
require(reshape2)
names(ChickWeight) <- tolower(names(ChickWeight))
DT <- melt(as.data.table(ChickWeight), id=2:4) # calls melt.data.table
# no S3 method yet, have to use "dcast.data.table"
dcast.data.table(DT, time ~ variable, fun=mean)
dcast.data.table(DT, diet ~ variable, fun=mean)
dcast.data.table(DT, diet+chick ~ time, drop=FALSE)
dcast.data.table(DT, diet+chick ~ time, drop=FALSE, fill=0)
# using subset
dcast.data.table(DT, chick ~ time, fun=mean, subset=.(time < 10 & chick < 20))
## Not run:
# on big data
set.seed(45)
DT <- data.table(aa=sample(1e4, 1e6, TRUE),
bb=sample(1e3, 1e6, TRUE),
cc = sample(letters, 1e6, TRUE), dd=runif(1e6))
system.time(dcast.data.table(DT, aa ~ cc, fun=sum)) # 0.31 seconds
system.time(dcast.data.table(DT, bb ~ cc, fun=mean)) # 0.11 seconds
# reshape2:::dcast takes 192.1 seconds
system.time(dcast.data.table(DT, aa + bb ~ cc, fun=sum)) # 3.1 seconds
## End(Not run)
|
Loading required package: reshape2
Attaching package: 'reshape2'
The following objects are masked from 'package:data.table':
dcast, melt
time weight
1: 0 41.06000
2: 2 49.22000
3: 4 59.95918
4: 6 74.30612
5: 8 91.24490
6: 10 107.83673
7: 12 129.24490
8: 14 143.81250
9: 16 168.08511
10: 18 190.19149
11: 20 209.71739
12: 21 218.68889
diet weight
1: 1 102.6455
2: 2 122.6167
3: 3 142.9500
4: 4 135.2627
diet chick 0 2 4 6 8 10 12 14 16 18 20 21
1: 1 18 39 35 NA NA NA NA NA NA NA NA NA NA
2: 1 16 41 45 49 51 57 51 54 NA NA NA NA NA
3: 1 15 41 49 56 64 68 68 67 68 NA NA NA NA
4: 1 13 41 48 53 60 65 67 71 70 71 81 91 96
5: 1 9 42 51 59 68 85 96 90 92 93 100 100 98
---
196: 4 49 40 53 64 85 108 128 152 166 184 203 233 237
197: 4 46 40 52 62 82 101 120 144 156 173 210 231 238
198: 4 50 41 54 67 84 105 122 155 175 205 234 264 264
199: 4 42 42 49 63 84 103 126 160 174 204 234 269 281
200: 4 48 39 50 62 80 104 125 154 170 222 261 303 322
diet chick 0 2 4 6 8 10 12 14 16 18 20 21
1: 1 18 39 35 0 0 0 0 0 0 0 0 0 0
2: 1 16 41 45 49 51 57 51 54 0 0 0 0 0
3: 1 15 41 49 56 64 68 68 67 68 0 0 0 0
4: 1 13 41 48 53 60 65 67 71 70 71 81 91 96
5: 1 9 42 51 59 68 85 96 90 92 93 100 100 98
---
196: 4 49 40 53 64 85 108 128 152 166 184 203 233 237
197: 4 46 40 52 62 82 101 120 144 156 173 210 231 238
198: 4 50 41 54 67 84 105 122 155 175 205 234 264 264
199: 4 42 42 49 63 84 103 126 160 174 204 234 269 281
200: 4 48 39 50 62 80 104 125 154 170 222 261 303 322
chick 0 2 4 6 8
1: 18 39 35 NaN NaN NaN
2: 16 41 45 49 51 57
3: 15 41 49 56 64 68
4: 13 41 48 53 60 65
5: 9 42 51 59 68 85
Using 'dd' as value column. Use 'value.var' to override
user system elapsed
0.326 0.008 0.334
Using 'dd' as value column. Use 'value.var' to override
user system elapsed
0.146 0.000 0.145
Using 'dd' as value column. Use 'value.var' to override
user system elapsed
1.931 1.186 5.169
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.