gfJulendat: Imputation of medium-sized gaps by linear modeling using...

View source: R/gfJulendat.R

gfJulendatR Documentation

Imputation of medium-sized gaps by linear modeling using adjacent GSOD stations

Description

This is a wrapper function encompassing several sub-functions from the Julendat gap-filling routine (processing level "0310", see Julendat). Briefly, a discontinuous measurement series from a distinct GSOD station is being filled by the use of simultaneous measurements from adjacent GSOD stations. Taking the gappy data as response variable, a linear model is fitted to explain the measurements taken by the station under investigation based on surrounding stations.

Usage

gfJulendat(
  files.dep,
  files.indep,
  filepath.coords = NULL,
  quality.levels = NULL,
  gap.limit,
  end.datetime = Sys.Date(),
  units = "days",
  na.limit = 0.5,
  time.window = 365,
  n.plot,
  prm.dep = "TEMP",
  prm.indep = "NA",
  family = stats::gaussian,
  plevel,
  ...
)

Arguments

files.dep

Character. Path leading to the response GSOD station. Note: the file needs to be formatted according to standard KiLi SP1 format, see gsod2ki.

files.indep

Character. Path(s) leading to the predictor GSOD station(s).

filepath.coords

Character, default is NULL. Path to file holding coordinate information. Not required for GSOD data processing.

quality.levels

Character, default is NULL. Not required for GSOD data processing.

gap.limit

Numeric. Maximum length of a measurement gap to be imputed. All gaps exceeding this threshold will not be considered.

end.datetime

Object of class Date, default is Sys.Date(). Not required for GSOD data processing.

units

Character. Measurement interval, typically "days" for GSOD data.

na.limit

Numeric, default is 0.5. Maximum amount of missing data within time.window. If a certain explanatory station exceeds this threshold, it will not be considered for linear modeling.

time.window

Numeric, default is 365. Window width before and after a data gap to extract data for linear modeling.

n.plot

Numeric. Number of explanatory GSOD stations to consider for linear modeling. If not supplied, all stations specified in files.indep will be considered.

prm.dep

Character, default is "TEMP". Determines which parameters to fill.

prm.indep

Character, default is "NA". Not required for GSOD data processing.

family

Object of class family, default is gaussian.

plevel

Character. Determines current processing level. Not required for GSOD data processing.

...

Additional arguments. Currently not in use.

Value

An object of class ki.data.

Author(s)

Florian Detsch

Examples

## Not run: 
library(GSODTools)

# Download data sets for selected GSOD stations
usafs = c(
  "gar" = "637230"   # GARISSA
  , "jom" = "637400" # NAIROBI JKIA
  , "kia" = "637910" # KILIMANJARO INTL
  , "voi" = "637930" # VOI
  , "mom" = "638200" # MOMBASA MOI INTL
  , "mor" = "638660" # MOROGORO (MET)
)

shp_gsod <- 
  gsodstations |> 
  gsodDf2Sp() |> 
  subset(`USAF` %in% usafs) |> 
  gsodDf2Sp()

df_gsod = Map(
  \(usaf, plot_id) {
    
    # Download and extraction
    tmp_df_gsod <- dlGsodStations(usaf = usaf, 
                                  start_year = 1990, end_year = 1995, 
                                  dsn = tempdir(), 
                                  unzip = TRUE,
                                  save_output = FALSE)
    
    # Fahrenheit -> Celsius
    tmp_df_gsod = transform(
      tmp_df_gsod
      , TEMP = toCelsius(TEMP, digits = 1L)
      , MIN = toCelsius(MIN, digits = 1L)
      , MAX = toCelsius(MAX, digits = 1L)
    )
    
    # GSOD -> `ki.data`
    tmp_ki_gsod <- gsod2ki(tmp_df_gsod, 
                           prm_col = c("TEMP", "MIN", "MAX"), 
                           timezone = "eat",
                           aggtime = "diurnal",
                           plot_id = plot_id, 
                           df2ki = TRUE)
    
    # Remove outliers
    for (j in c("TEMP", "MIN", "MAX")) {
      methods::slot(tmp_ki_gsod, "Parameter")[[j]] = outlier2na(
        methods::slot(tmp_ki_gsod, "Parameter")[[j]]
        , lower_quantile = 0.2
        , upper_quantile = 0.8
      )
    }
    
    # Fill small gaps by linear interpolation
    tmp_ki_gsod <- gfLinInt(tmp_ki_gsod, 
                            prm = c("TEMP", "MIN", "MAX"))
    
    # Save data created so far
    tmp_df_gsod <- gfOutputData(tmp_ki_gsod, plevel = "NA")
    write.csv(tmp_df_gsod, sprintf("%s/%s.csv", tempdir(), usaf), 
              row.names = FALSE)
    
    return(tmp_ki_gsod)    
  }
  , methods::slot(shp_gsod, "data")$USAF
  , names(usafs)
)

# Fill medium-sized gaps at Kilimanjaro Intl. Airport by linear modeling
fls_gsod <- list.files(tempdir(), pattern = "^\\d{6}.csv$", 
                       recursive = TRUE, full.names = TRUE)

jul_gsod <- gfJulendat(files.dep = fls_gsod[3],
                       files.indep = fls_gsod[-3],
                       filepath.coords = NULL,
                       quality.levels = NULL,
                       gap.limit = 1825, 
                       na.limit = .9,
                       time.window = 913,
                       n.plot = 10,
                       prm.dep = c("TEMP", "MAX", "MIN"), 
                       prm.indep = c(NA, NA, NA), 
                       plevel = "NA", 
                       end.datetime = Sys.Date(), 
                       units = "days")

plot(jul_gsod$TEMP, col = "red", type = "l")
lines(methods::slot(df_gsod[[3]], "Parameter")$TEMP, col = "grey75")

## End(Not run)


environmentalinformatics-marburg/GSODTools documentation built on Jan. 5, 2024, 12:19 a.m.