Gapfill_em: Gap-fill using EM

View source: R/Gapf_EM.R

Gapfill_emR Documentation

Gap-fill using EM

Description

This function automatically gap-fills the missing data points (marked as "NA") in the flux dataset using expectation-maximization (EM) algorithm with up to 3 parallel measured reference flux time series. The function is based on the algorithms in the package 'mtsdi'.

Usage

Gapfill_em(
  data,
  ref1,
  ref2 = NULL,
  ref3 = NULL,
  Flux = "Flux",
  Flux1 = Flux,
  Flux2 = Flux,
  Flux3 = Flux,
  Date = "Date",
  Date_form = "ymd_hms",
  win = 5,
  interval = 10,
  ts = TRUE,
  method = "spline",
  sp_df = 10,
  fail = "ave",
  ...
)

Arguments

data

a data frame that includes the flux (with NA indicating the missing data)

ref1

a data frame that includes the parallel measured reference flux time series #1, does not require to have the same length as the target data to be filled

ref2

a data frame that includes the parallel measured reference flux time series #2 (optional), does not require to have the same length as the target data to be filled. Default: NULL

ref3

a data frame that includes the parallel measured reference flux time series #3 (optional), does not require to have the same length as the target data to be filled. Default: NULL

Flux

a string indicates the column name of the flux variable to be gap-filled

Flux1

a string indicates the column name of the reference time series in ref1. Default: same as Flux

Flux2

a string indicates the column name of the reference time series in ref2. Default: same as Flux

Flux3

a string indicates the column name of the reference time series in ref3. Default: same as Flux

Date

a string indicates the column name for the date in data, ref1, ref2 and ref3, and it HAS to include the time information. Note that all the data frames should have the same name for the date column.

Date_form

a string indicates the format of the date in data, ref1, ref2 and ref3, either "ymd_hms" (default), "mdy_hms" or "dmy_hms". Note that all the data frames should have the same date format.

win

a number indicates the required sampling window length around each gap (total number in two sides), unit: days (default: 5)

interval

a number indicates the temporal resolution of the measurements in the dataset, unit: minutes (default: 10)

ts

logical. TRUE if it is time series. Default: TRUE

method

a string indicates the method for univariate time series filtering, either "spline" (default),"arima", or "gam". See details in the package 'mtsdi'.

sp_df

an integer indicates the degrees of freedom to be used for the splines (Default: 10). In case set to NULL, the degrees of freedom will be chosen by cross-validation. See details in the package 'mtsdi'.

fail

a string or a number indicates what to do when model fails to converge: 1. use the mean value in the sampling window to fill the gap ("ave", default), or 2. use any value assigned here to fill the gap (e.g., 9999, NA, etc.)

...

other arguments pass to 'mnimput'

Value

A data frame that includes the original data, gap-filled data ("filled") and a "mark" column that indicates the value in each row of the "filled" is either: 0. original, 1. gap-filled, or 2. failed to converge

Examples

# read example data
df <- read.csv(file = system.file("extdata", "Soil_resp_example.csv", package = "FluxGapsR"),header = T)
df_ref <- read.csv(file = system.file("extdata", "Soil_resp_ref_example.csv", package = "FluxGapsR"),header = T)
df_filled <- Gapfill_em(data = df,ref1 = df_ref)
# visualize the gapfilled results
plot(df_filled$filled,col="red")
points(df_filled$Flux)

junbinzhao/FluxGapsR documentation built on Nov. 19, 2022, 9:17 p.m.