knn_space_time_disagg: Spatial and temporal disaggregation of flow data
In rabutler-usbr/knnstdisagg: Nonparametric Space-Time Streamflow Disaggregation Using KNN

knn_space_time_disagg

R Documentation

Spatial and temporal disaggregation of flow data

Description

knn_space_time_disagg() disaggregates annual flow data spatially and temporally (to monthly), using a spatial and monthly flow pattern selected from an "index year". The index year is selected using a k nearest-neighbor approach, (Knowak et al., 2010).

Usage

knn_space_time_disagg(
  ann_flow,
  ann_index_flow,
  mon_flow,
  start_month,
  nsim = 1,
  scale_sites = TRUE,
  index_years = NULL,
  k_weights = knn_params_default(nrow(ann_index_flow)),
  random_seed = NULL
)

Arguments

`ann_flow`	A 2-column matrix, with years in column 1, and an annual flow in column 2. This is the annual flow that needs to be disaggregated.
`ann_index_flow`	A 2-column matrix, with years in column 1, and an annual flow in column 2. This is the index gage that the flows in `ann_flow` will be compared to. After the comparison, the nearest neighbor years are selected.
`mon_flow`	Monthly natural flow. Used for spatially and temporally disaggregating the flow data (`ann_flow`) based on the index year selected from `ann_index_flow`, by `knn_get_index_year()`. Each column represents a different site, and the annual flow at the index gage will be disaggregated to each of these sites at he monthly level. If there are three columns in this matrix, then the values in `ann_flow` will be disaggregated to three sites. `mon_flow` should have the same years as `ann_index_flow`, therefore, it should contain 12 times more rows than `ann_index_flow`. The flow data in `mon_flow` should also contain values for the same years as `ann_index_flow`, though there are no checks performed to check this, since this is expected to be a dimensionless matrix. `mon_flow` can have named or unnamed columns. If they are named, the names are preserved in the disaggregated output. If they are unnamed, the columns are renamed S1-SN where N is the number of columns.
`start_month`	The start month of the `mon_flow` as an integer. 1 = January, 2 = February, etc. Used to correctly label the output data.
`nsim`	Number of times to repeat the space/time disaggregation.
`scale_sites`	`TRUE`/`FALSE` - scale all the sites. Otherwise, a numeric vectotr specifying the site numbers (column indices), that will scale the index year's volume based on the annual flow being disaggregated. The remaining sites will select the index year directly. See Details.
`index_years`	Optional. If specified, these index years will be used instead of selecting years based on weighted sampling via `knn_get_index_year()`.
`k_weights`	A `knn_params()` object. By default, it uses `knn_params_default()`.
`random_seed`	A single integer or `NULL`. If an integer, then it is used with `set.seed()` so reproducible results can be guaranteed.

Details

The method is described in detail in Knowak et al. (2010). The methodology disaggregates annual flow data (ann_flow) by selecting an index year from ann_index_flow using knn_get_index_year(). After the index year is selected, values from ann_flow are disaggregated spatially, and temporally based on mon_flow. The spatial pattern is reflected by including different sites as columns in mon_flow, and the monthly disaggregation, uses the monthly pattern in mon_flow to disaggregate the data temporally. Summability is preserved using this method, if the values selected in mon_flow are scaled and if the columns (or a subset of columns) in mon_flow sum together to equal ann_index_flow.

scale_sites: In some cases, it is desirable to select monthly flow directly, instead of scaling it. This can be performed by only scaling certain sites, using scale_sites. scale_sites should be a boolean, or a vector of numerics. If TRUE, then all sites are scaled. If FALSE, all sites monthly values are selected directly. Otherwise, scale_sites should be a vector of the sites that should be scaled, based on their column index from mon_flow. For example, if mon_flow is a matrix with 4 columns, then setting mon_flow to c(2, 3) will scale the values in sites 2 and 3 (columns 2 and 3), while selecting flow values directly in sites 1 and 2.

Value

A knnst object.

Author(s)

Ken Nowak

References

Nowak, K., Prairie, J., Rajagopalan, B., Lall, U. (2010). A nonparametric stochastic approach for multisite disaggregation of annual to daily streamflow. Water Resources Research.

Examples


# a sample of three years of flow data
flow_mat <- cbind(c(2000, 2001, 2002), c(1400, 1567, 1325))
# made up historical data to use as index years
ind_flow <- cbind(1901:1980, rnorm(80, mean = 1500, sd = 300))
# make up monthly flow for two sites
mon_flow <- cbind(
  rnorm(80 * 12, mean = 20, sd = 5),
  rnorm(80 * 12, mean = 120, sd = 45)
)
knn_space_time_disagg(flow_mat, ind_flow, mon_flow, 1, scale_sites = 1:2)

rabutler-usbr/knnstdisagg documentation built on Sept. 14, 2023, 2:47 p.m.