knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Problem

Solution

Example

bigrdf_to_rwtbl() returns a connection to the parquet file, but not the data itself. This might take ~5 minutes to run. It will process 20 traces at a time, and will report out as it starts each chunk of 20 traces.

library(RWDataPlyr)
library(dplyr)
library(stringr)
rdf_path <- "//manoa.colorado.edu/bor/Shared/P26/Dec202024_NA_CCS_FA_Final_Runs/2016Dems,CRMMS_Trace12,ICSon,SuperEnsembleV3,CCS.9087.mdl,CCS.9047.rls/AZWU.rdf"

zz <- bigrdf_to_rwtbl(rdf_path, scenario = 'CCS', n_trace_per_chunk = 20)
zz

Then, you can work with the data in a typical dplyr pipeline, and move it into memory when it is smaller. Use collect() to move it into memory.

# get the annual depletion requested for each user
df <- zz |> 
  filter(str_ends(ObjectSlot, '\\.Depletion Requested')) |>
  group_by(Scenario, ObjectSlot, Year) |>
  summarise(Value = sum(Value)) |>
  collect()

Now df is a reasonable size (.1 MB) to keep in memory and use as you normally would.

Details

Next Steps

If this seems to help/work, then rdf_aggregate() and rwscen_aggregate() will be enhanced to also be able to work with 'big' rdfs.



BoulderCodeHub/RWDataPlyr documentation built on June 2, 2025, 7:24 a.m.