View source: R/prep_metabolism.R
prep_metabolism | R Documentation |
Formats the output of request_data
for stream metabolism model
of choice. Filters flagged data and imputes missing data. Acquires/estimates
additional variables if necessary. NOTE: support for modeling with
BASE
is currently in development. Please use streamMetabolizer
in the meantime.
prep_metabolism(
d,
model = "streamMetabolizer",
type = "bayes",
interval = NA,
rm_flagged = list("Bad Data", "Questionable"),
fillgaps = "interpolation",
maxhours = 3,
zq_curve = list(sensor_height = NULL, Z = NULL, Q = NULL, a = NULL, b = NULL, fit =
"power", ignore_oob_Z = TRUE, plot = TRUE),
estimate_areal_depth = FALSE,
estimate_PAR = TRUE,
retrieve_air_pres = FALSE,
...
)
d |
the output of |
model |
either 'streamMetabolizer' (the default) or 'BASE'. If 'BASE',
|
type |
either 'mle' or 'bayes'. If |
interval |
a string specifying the between-sample time interval to which the dataset should be coerced, or NA to determine automatically. If not NA, Must be of the form '<number> <unit>', as in '15 min'. Unit can be 'min' or 'hour'. Non-integer hours are tolerated, but minutes must be specified as integers. See details. |
rm_flagged |
a list containing any of 'Interesting', 'Questionable',
and 'Bad Data'. Any data points flagged with these specified tags will be
removed (replaced with NA), and then imputed according to |
fillgaps |
a string specifying one of the imputation methods available
to |
maxhours |
the maximum number of hours of consecutive NAs to impute. |
zq_curve |
a list containing specifications for a rating curve, used to estimate discharge from level or depth. Elements of this list may include any of the following: Z (a vector of level or depth data), Q (a vector of discharge data), a (the first parameter of an existing rating curve), b (the second parameter of an existing rating curve), sensor_height (the vertical distance between streambed and sensor, in meters), fit (the form of the rating curve to predict discharge from and, if Z and Q supplied, to fit), ignore_oob_Z (if there are depth or level readings that exceed the maximum measured Z value of the rating curve, whether to replace these with NA), and plot (whether to plot the fitted curve, if applicable, as well as predicted discharge). See details for more. |
estimate_areal_depth |
logical; Metabolism models expect that input depth time series represent depth averaged over an area delineated by the width of the stream and the approximate O2 turnover distance. Set to TRUE if you'd like to estimate this average depth, or FALSE if your depth data already approximate it. For example, if your depth data represent average depth over the aforementioned area already, or average depth for a stream cross-section, you'd probably want to use FALSE. If your depth data represent only depth-at-sensor, or worse, level-at-sensor, you might be better off with TRUE, assuming you have discharge data to estimate areal depth from, or a rating curve by which to generate discharge data. |
estimate_PAR |
logical; should Photosynthetically Active Radiation (PAR) be estimated from geographic coordinates and time? Only use light data if you're confident that your light sensors accurately represent light reaching the upstream area defined by O2 turnover distance. |
retrieve_air_pres |
logical; if some AirPres_kPa values are missing, should they be retrieved from NCDC (NOAA)? Retrieval will happen automatically if air pressure data are required and entirely missing. |
... |
additional arguments passed to |
BASE
and streamMetabolizer
, the two metabolism modeling
platforms available via StreamPULSE, require different data input formats.
Formatting also varies depending on whether one is using a Bayesian framework
or MLE. This function supplements and rearranges the raw output of
request_data
to prepare it for a desired set of model
specifications.
Both BASE
and streamMetabolizer
require dissolved oxygen (DO)
concentration, water temperature, and light (PAR) data. If light is missing,
it will automatically be estimated based on solar angle. In addition to these
variables,
streamMetabolizer
requires DO % saturation and depth, and
BASE
requires atmospheric pressure. If DO % saturation is missing,
it will be calculated automatically from DO concentration, water temperature,
and atmospheric pressure. In turn, atmospheric pressure estimates will
be automatically retrieved from NOAA (NCDC), if missing,
for sites anywhere on earth.
If streamMetabolizer
is being used and
type='bayes'
, discharge time series data are also required.
In the absence of such data, they can be estimated from the relationship
between discharge and depth (i.e. the vertical distance between streambed and
surface) or
level (AKA stage; i.e. the vertical distance between some arbitrary datum,
such as sensor
height, and surface), via the zq_curve
parameter.
Here, depth or level is referred to as Z, discharge
is reffered to as Q, and the relationship between them is called a rating
curve. In order to fit such a curve, one must collect, sometimes
manually, a set of data points for both Z and Q. Here we assume the user
also has time series data for Z, which can then be used to predict a series
of Q at each time point. If the sampled Z data used to fit the curve
represent level, and the Z time series data represent depth, the
sensor_height
parameter can be used to make them commensurable.
If Z is supplied, Q must be supplied, and vice-versa. Likewise with a and b.
If all are supplied, Z and Q will be ignored. Rating curves can take many
forms. Options here include power, exponential, and linear. A common
difficulty of
fitting these curves is that it's hard to accurately measure discharge in
high flow conditions, yet without accounting for these conditions
in the curve,
high flow discharge estimates can be far off from reality, especially if
the curve's form is power or exponential. In these cases, it's often safest
to omit high flow data points from the curve entirely by setting
ignore_oob_Z=TRUE
. In some cases it makes sense to model the curve
with a linear fit, though of course this too will misrepresent reality.
Using fit='linear'
may also result in negative discharge estimates.
All single-station models assume that, where applicable, variables represent averages throughout an area delineated by the width of the stream and the approximate oxygen turnover distance. More on this and other considerations can be found by clicking the "Before modeling stream metabolism..." button on https://data.streampulse.org.
The between-sample interval is determined programmatically
for each variable within d
. It is assumed to be the mode
if the between-sample interval varies within a series. If the
between-sample interval varies across series, the longest interval is
used for the whole dataset, unless interval
is specified.
If the user-specified interval is a multiple of the programmatically
determined longest interval, the dataset will be quietly coerced to the
user-specified interval.
This is useful for thinning extremely long datasets in order to
avoid out-of-memory errors while running models. If intervals vary across
series, the user may specify which of the available intervals to coerce
all series to. If user-specified and
programmatically-determined intervals are identical, no action is taken.
returns an S4 object containing a data.frame
formatted for
the model specified by model
and type
.
Mike Vlah, vlahm13@gmail.com
Aaron Berdanier
request_data
for acquiring StreamPULSE data;
fit_metabolism
for fitting models.
query_available_data(region='all')
streampulse_data = request_data(sitecode='NC_Eno',
startdate='2016-06-10', enddate='2016-10-23')
fitdata = prep_metabolism(d=streampulse_data, type='bayes',
model='streamMetabolizer', interval='15 min',
rm_flagged=list('Bad Data', 'Questionable'), fillgaps=fillgaps,
zq_curve=list(sensor_height=NULL, Z=Z_data, Q=Q_data,
fit='power', plot=TRUE), estimate_areal_depth=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.