mm_filter_valid_days: Remove entries in data

View source: R/mm_filter_valid_days.R

mm_filter_valid_daysR Documentation

Remove entries in data

Description

Filter out any data rows that don't pass the specified tests for completeness and regularity

Usage

mm_filter_valid_days(
  data,
  data_daily = NULL,
  day_start = 4,
  day_end = 27.99,
  day_tests = c("full_day", "even_timesteps", "complete_data", "pos_discharge"),
  required_timestep = NA,
  timestep_days = TRUE
)

Arguments

data

data.frame of instantaneous observations, to be filtered to only those points on days that pass the specified tests in mm_is_valid_day

data_daily

data.frame of daily estimates/statistics, to be filtered in accordance with the filtering of data

day_start

start time (inclusive) of a day's data in number of hours from the midnight that begins the date. For example, day_start=-1.5 indicates that data describing 2006-06-26 begin at 2006-06-25 22:30, or at the first observation time that occurs after that time if day_start doesn't fall exactly on an observation time. For metabolism models working with single days of input data, it is conventional/useful to begin the day the evening before, e.g., -1.5, and to end just before the next sunrise, e.g., 30. For multiple consecutive days, it may make the most sense to start just before sunrise (e.g., 4) and to end 24 hours later. For nighttime regression, the date assigned to a chunk of data should be the date whose evening contains the data. The default is therefore 12 to 36 for metab_night, of which the times of darkness will be used.

day_end

end time (exclusive) of a day's data in number of hours from the midnight that begins the date. For example, day_end=30 indicates that data describing 2006-06-26 end at the last observation time that occurs before 2006-06-27 06:00. See day_start for recommended start and end times.

day_tests

list of tests to conduct to determine whether each date worth of data is valid for modeling. The results of these tests will be combined with the result of the test implied if required_timestep is numeric and then will be passed to model_fun as the ply_validity argument to that function.

required_timestep

NA or numeric (length 1). If numeric, the timestep length in days that a date must have to pass the validity check (to within a tolerance of 0.2% of the value of required_timestep). The result of this test will be combined with the results of the tests listed in day_tests and reported to model_fun as the ply_validity argument to that function.

timestep_days

TRUE if you would like the mean timestep length to be calculated for each data ply and passed to model_fun as the timestep_days argument to that function. Alternatively, this may be numeric as a specifically expected timestep length in days; for example, a 1-hour timestep is 1/24 is 0.0416667.

Value

list of data and data_daily with same structure as inputs but with invalid days removed, plus a third data.frame of dates that were removed

Examples

dat <- data_metab(res='30', num_days='10', flaws='missing middle')
datfilt <- mm_filter_valid_days(dat)
datfilt$removed
c(nrow(dat), nrow(datfilt$data))

USGS-R/streamMetabolizer documentation built on Aug. 15, 2023, 7:50 a.m.