Gapfill_em: Gap-fill using EM
In junbinzhao/FluxGapsR: Gap-filling for flux data using different methods

View source: R/Gapf_EM.R

Gapfill_em

R Documentation

Gap-fill using EM

Description

This function automatically gap-fills the missing data points (marked as "NA") in the flux dataset using expectation-maximization (EM) algorithm with up to 3 parallel measured reference flux time series. The function is based on the algorithms in the package 'mtsdi'.

Usage

Gapfill_em(
  data,
  ref1,
  ref2 = NULL,
  ref3 = NULL,
  Flux = "Flux",
  Flux1 = Flux,
  Flux2 = Flux,
  Flux3 = Flux,
  Date = "Date",
  Date_form = "ymd_hms",
  win = 5,
  interval = 10,
  ts = TRUE,
  method = "spline",
  sp_df = 10,
  fail = "ave",
  ...
)

Arguments

`data`	a data frame that includes the flux (with NA indicating the missing data)
`ref1`	a data frame that includes the parallel measured reference flux time series #1, does not require to have the same length as the target data to be filled
`ref2`	a data frame that includes the parallel measured reference flux time series #2 (optional), does not require to have the same length as the target data to be filled. Default: NULL
`ref3`	a data frame that includes the parallel measured reference flux time series #3 (optional), does not require to have the same length as the target data to be filled. Default: NULL
`Flux`	a string indicates the column name of the flux variable to be gap-filled
`Flux1`	a string indicates the column name of the reference time series in ref1. Default: same as Flux
`Flux2`	a string indicates the column name of the reference time series in ref2. Default: same as Flux
`Flux3`	a string indicates the column name of the reference time series in ref3. Default: same as Flux
`Date`	a string indicates the column name for the date in data, ref1, ref2 and ref3, and it HAS to include the time information. Note that all the data frames should have the same name for the date column.
`Date_form`	a string indicates the format of the date in data, ref1, ref2 and ref3, either "ymd_hms" (default), "mdy_hms" or "dmy_hms". Note that all the data frames should have the same date format.
`win`	a number indicates the required sampling window length around each gap (total number in two sides), unit: days (default: 5)
`interval`	a number indicates the temporal resolution of the measurements in the dataset, unit: minutes (default: 10)
`ts`	logical. TRUE if it is time series. Default: TRUE
`method`	a string indicates the method for univariate time series filtering, either "spline" (default),"arima", or "gam". See details in the package 'mtsdi'.
`sp_df`	an integer indicates the degrees of freedom to be used for the splines (Default: 10). In case set to NULL, the degrees of freedom will be chosen by cross-validation. See details in the package 'mtsdi'.
`fail`	a string or a number indicates what to do when model fails to converge: 1. use the mean value in the sampling window to fill the gap ("ave", default), or 2. use any value assigned here to fill the gap (e.g., 9999, NA, etc.)
`...`	other arguments pass to 'mnimput'

Value

A data frame that includes the original data, gap-filled data ("filled") and a "mark" column that indicates the value in each row of the "filled" is either: 0. original, 1. gap-filled, or 2. failed to converge

Examples

# read example data
df <- read.csv(file = system.file("extdata", "Soil_resp_example.csv", package = "FluxGapsR"),header = T)
df_ref <- read.csv(file = system.file("extdata", "Soil_resp_ref_example.csv", package = "FluxGapsR"),header = T)
df_filled <- Gapfill_em(data = df,ref1 = df_ref)
# visualize the gapfilled results
plot(df_filled$filled,col="red")
points(df_filled$Flux)

junbinzhao/FluxGapsR documentation built on Nov. 19, 2022, 9:17 p.m.