lsat_calibrate_rf: Cross-calibrate Landsat sensors using Random Forests models

View source: R/lsat_calibrate_rf.R

lsat_calibrate_rfR Documentation

Cross-calibrate Landsat sensors using Random Forests models

Description

There are systematic differences in spectral indices (e.g., NDVI) among Landsat 5, 7, and 8 (Landsat Collection 2). It is important to address these differences before assessing temporal trends in spectral data. Failure to address these differences can, for instance, introduce artificial positive trends into NDVI time-series that are based on measurements from multiple Landsat sensors (Ju and Masek 2016, Roy et al. 2016, Berner et al. 2020). This function cross-calibrates individual bands or spectral indices from Landsat 5/8 to match Landsat 7. Landsat 7 is used as a benchmark because it temporally overlaps with the other two sensors. Cross-calibration can only be performed on one band or spectral index at a time. The approach involves determining the typical reflectance at a sample during a portion of the growing season site using Landsat 7 and Landsat 5/8 data that were collected the same years. A Random Forest model is then trained to predict Landsat 7 reflectance from Landsat 5/8 reflectance. To account for potential seasonal and regional differences between sensors, the Random Forest models also include as covariates the midpoint of each 15-day period (day of year), the spatial coordinates of each sample sample, and potentially other use-specified variables. This approach is most suitable when working with data from 100s to preferably 1000s of sample samples.

The specific steps to cross-calibrating sensors include: (1) Identify the years when both Landsat 7 and Landsat 5/8 measured surface reflectance at a sample sample. (2) Pool the reflectance measurements across those years and compute 15-day moving median reflectance over the course of the growing season for each sensor and sampling sample. (3) Exclude 15-day periods with fewer than a specified number of measurements from both sets of sensors and then randomly select one remaining 15-day period from each sample sample. (4) Split the data into sets for model training and evaluation. (5) Train Random Forest models that predict Landsat 7 reflectance based on Landsat 5/8 reflectance. The models also account for potential seasonal and regional differences between sensors by including as covariates the midpoint of each 15-day period (day of year) and the spatial coordinates of each sampling sample. The models are trained using the ranger function from the ranger package (Wright and Ziegler, 2017). (6) Apply the fitted Random Forest models to cross-calibrate measurements.

See Berner et al. (2020) for a full description of the approach.

Usage

lsat_calibrate_rf(
  dt,
  band.or.si,
  doy.rng = 152:243,
  min.obs = 5,
  train.with.highlat.data = F,
  add.predictors = NULL,
  frac.train = 0.75,
  trim = T,
  overwrite.col = F,
  write.output = F,
  outfile.id = band.or.si,
  outdir = NA
)

Arguments

dt

Data.table containing the band or spectral index to cross-calibrate.

band.or.si

Character string matching the column name of the band or spectral index to cross-calibrate.

doy.rng

Sequence of numbers specifying the Days of Year to use for model development.

min.obs

Minimum number of paired, seasonally-matched observations from Landsat 7 and Landsat 5/8 required to include a sampling sample.

train.with.highlat.data

(True/False) Should the RF models be trained using an internal high-latitude dataset that sampled the Arctic and Boreal biomes?

add.predictors

Vector of additional predictors to use in the Random Forest models. These should be time-invariant and match column names.

frac.train

Fraction of samples to use for training the random forest models. The remaining samples are used for model cross-validation.

trim

(True/False) If true, then for each sample site the percent difference in spectral indices between satellites is determined. The lowest 2.5 and highest 97.5 percentiles are then trimmed. This is meant to reduce potential differences that are not directly attributable to the sensors, but rather to exogenous factors.

overwrite.col

(True/False) Overwrite existing column or (by default) append cross-calibrated data as a new column?

write.output

(True/False) Should RF models and evaluation content be written to disk? Either way, evaluation table and figure are printed in the console.

outfile.id

Identifier used when naming output files. Defaults to the input band, but can be specified if needed such as when performing Monte Carlo simulations.

outdir

Output directory (created if necessary) to which multiple files will be written. The files include: (1) fitted random forest models as R objects, (2) evaluation data in a csv file, (3) summary of model cross-validation in a csv file, and (4) multi-panel scatter plot comparing sensors pre- and post-calibration in jpeg format. If cross-calibrating both Landsat 5 and 8, then the function returns files for both sensors.

Value

The input data.table with an appended column titled band.xcal, where "band" is your specified band or spectral index.

Examples

data(lsat.example.dt)
lsat.dt <- lsat_format_data(lsat.example.dt)
lsat.dt <- lsat_clean_data(lsat.dt)
lsat.dt <- lsat_calc_spectral_index(lsat.dt, 'ndvi')
lsat.dt <- lsat_calibrate_rf(lsat.dt, band.or.si = 'ndvi', 
                             train.with.highlat.data = TRUE, 
                             write.output = FALSE)
lsat.dt

logan-berner/lsatTS documentation built on Oct. 21, 2024, 12:23 a.m.