dist_data: Compile a specialized data frame based on the 'rvtable'...
In leonawicz/snaputils: SNAP-Specific Utilities for Shiny Apps

Description Usage Arguments Details Value Examples

Compile a specialized data frame based on the rvtable package using distribution data frames of SNAP downscaled climate data.

dist_data(data, variable, margin = NULL, seed = NULL, metric = NULL,
  year_range, rcp_min_yr, base_max_yr, all_models, baseline_model = NULL,
  composite = "Composite GCM", baseline_scenario = "Historical",
  general_scenario = "Projected", margin_drop = c(baseline_scenario,
  baseline_model), density_size = 200, margin_size = 100,
  sample_size = margin_size, limit_sample = TRUE,
  baseline_only = FALSE, progress = TRUE)

`data`	a data frame. It does not need to be an `rvtable`-class data frame in advance, but it must be coercible to one.
`variable`	character, a valid random variable. See details for currently available options.
`margin`	variable to marginalize over. Defaults to `NULL`.
`seed`	numeric or `NULL` (default), set random seed for reproducible sampling in app.
`metric`	`NULL` or logical. Output data in metric units, otherwise in US Standard. Input data in `data` is assumed metric. If `NULL` (default), no conversion or climate variable-specific rounding is performed.
`year_range`	full range of years in data set.
`rcp_min_yr`	minimum year for RCP, e.g., for CMIP5 data this is 2006.
`base_max_yr`	maximum year for baseline historical comparison data set that sometimes accompanies GCM data (e.g., CRU observation-based data, version 4.0 is 2015)
`all_models`	character, vector of climate model names in data set, to include baseline model if present.
`baseline_model`	character, name of baseline model in data set, e.g., `"CRU 4.0"`.
`composite`	character, name to use for composite climate models after marginalizing over models.
`baseline_scenario`	character, defaults to `"Historical"`.
`general_scenario`	character, defaults to `"Projected"`.
`margin_drop`	levels of variables to exclude from marginalizing operations on those variables. Defaults to the baseline scenario and baseline model.
`density_size`	numeric, sample size for density estimations. Defaults to `200`.
`margin_size`	numeric, sample size for marginalizing operations. Defaults to `100`.
`sample_size`	numeric, sample size for density estimations. Defaults to `margin.size`.
`limit_sample`	logical, see details.
`baseline_only`	logical, only processing baseline data set. Useful for climatology data.
`progress`	logical, include progress bar in app.

This is a specialized function suited to preparing reactive data frames for an app where the upstream source data represents an rvtable-class probability density data frame from the rvtable package. Many such data frames of SNAP data are available.

This function assumes the presence of certain data frame columns: Val, Prob, Var, RCP, Model, and Year. It will insert a Decade column. It will check to ensure a valid Var column, meaning a data frame can contain only one unique variable in its Var ID column and it must currently be one of "pr", "tas", "tasmin", "tasmax". This is because the current implementation makes certain assumptions about the data based on presently existing realistic use cases.

A powerful feature of this function, given an appropriate rvtable data frame, is the ability to marginalize over categorical variables (and meaningfully discrete numeric variables such as year) using the margin argument. The current implementation allows marginalizing over RCPs and/or climate models.

Arguments such as variable and year.range can be determined internally with data directly, but in the app context these variables are already determined in the session environment and there is no need to repeat scans of large data frames columns with every call to dist_data.

Note that during marginalizing operations, baseline historical data sets are not integrated with climate models when integrating models and historical climate models years are not integrated with future projections when integrating RCPs. All categorical variables are factors with explicit levels, not character.

If limit.sample=TRUE (default), the final sample size is reduced by a factor proportional to the number of unique RCP-GCM pairs. This helps prevent massive in-app samples when users select large amounts of data from many RCPs and models. A minimum sample size per group is still maintained regardless of how much data is requested. Detailed progress is provided for sampling from distributions and for calculating marginal distributions.

a specialized data frame