knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
rdf_to_rwtbl2()
nor by rdf_aggregate()
or by rw_scen_aggregate()
, which both rely on rdf_to_rwtbl2()
.bigrdf_to_rwtbl()
works similarly to rdf_to_rwtbl2()
creating the same 'long' data frame, but does not store it in memorybigrdf_to_rwtbl()
returns a connection to the parquet file, but not the data itself. This might take ~5 minutes to run. It will process 20 traces at a time, and will report out as it starts each chunk of 20 traces.
library(RWDataPlyr) library(dplyr) library(stringr) rdf_path <- "//manoa.colorado.edu/bor/Shared/P26/Dec202024_NA_CCS_FA_Final_Runs/2016Dems,CRMMS_Trace12,ICSon,SuperEnsembleV3,CCS.9087.mdl,CCS.9047.rls/AZWU.rdf" zz <- bigrdf_to_rwtbl(rdf_path, scenario = 'CCS', n_trace_per_chunk = 20) zz
Then, you can work with the data in a typical dplyr pipeline, and move it into memory when it is smaller. Use collect()
to move it into memory.
# get the annual depletion requested for each user df <- zz |> filter(str_ends(ObjectSlot, '\\.Depletion Requested')) |> group_by(Scenario, ObjectSlot, Year) |> summarise(Value = sum(Value)) |> collect()
Now df
is a reasonable size (.1 MB) to keep in memory and use as you normally would.
collect(zz)
, but this will take ~14 GB of free memoryn_trace_per_chunk
variable controls how many traces are parsed on each call to the C++ code. rdf_to_rwtbl2()
).bigrdf_move(df, 'path/to/move/to')
arrow::open_dataset('path/to/move/to')
to reconnect to these data in the futurerdf_to_rwtbl2()
is faster than bigrdf_to_rwtbl()
, so for smaller rdf files that's probably still the preferred method for getting the data into RIf this seems to help/work, then rdf_aggregate()
and rwscen_aggregate()
will be enhanced to also be able to work with 'big' rdfs.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.