In computationales/flux_data_kit: Flux Data Kit

library(dplyr)
library(tidyr)
library(ggplot2)
library(FluxDataKit)
library(FluxnetLSM) # install dev version from https://github.com/geco-bern/FluxnetLSM
library(FluxnetEO)  # install dev version from https://github.com/geco-bern/FluxnetEO

evaluate <- ifelse(
  Sys.info()['nodename'] == "balder" | 
  Sys.info()['nodename'] == "dash" |
  Sys.info()['nodename'] == "pop-os",
  TRUE,
  FALSE
  )

Note this routine will only run with the appropriate files in the correct place. Given the file sizes involved no demo data can be integrated into the package.

Compiling data

As mentioned in the introduction (once all required data are downloaded) the FluxDataKit package ensures the proper compilation of rsofun driver data. Although we will distribute a finished dataset the below instructions allow you to recreate these data for a particular site (or sites).

To generate consistent data from FLUXNET formatted sources we need a site list with some additional meta-data. A site list is generated using the script as described in the 'data coverage' vignette. We refer to this vignette to compile the list of sites which can be processed.

Once a site list has been compiled you can use it (and all the other input data) to generated either land surface model or rsofun compatible datasets. Here, the former is used as a precursor to the latter.

LSM formatting

By default land surface model compatible data is generated using the FluxnetLSM package. Retaining only this data can be done by specifying the format parameter, and setting it to "lsm". This routine will only save the netcdf intermediates that are otherwise used for formatting p-model compatible data and will not include any other ancillary data.

Meta-data requirements: Note that the sites file can be generated by other means than the included script. It only has to contain the following values: A site name (sitename), latitude and longitude (lat, lon), elevation (elv), the start and end date of the dataset (date_start / date_end), the original product and data path (product and data_path respectivelly, which are combined into the formal data directory), a start and end year (year_start, year_end), the Koeppen Geiger code for a site (koeppen_code_beck), the water holding capacity (whc), and the IGBP land cover class (igbp_land_use). Routines specified in the processing scripts are there to make it easy to gather these data but users are free to compile additional data for their own use. The above fields are however required.

# load the sites to process
# as generated from scripts in `data-raw` (see github repo)
sites <- FluxDataKit::fdk_site_info |>
  filter(
    sitename == "FR-Fon"
  ) |>
  mutate(
    data_path = "/data/scratch/FDK_inputs/flux_data"
  )

# output LSM formatted data
fdk_process_lsm(
  sites,
  out_path = tempdir(),
  modis_path = "/data/scratch/FDK_inputs/modis",
  overwrite = TRUE
)

# list generated files
list.files(tempdir(),glob2rx("*FR-Fon*.nc"), recursive = TRUE)

FLUXNET formatting

By default the format parameter is set to "lsm", providing land surface model netcdf files as output. You can specify "fluxnet" to convert the data to rsofun compatible FLXUNET output.

# read in demo data
# for FR-Fon site (as LSM data)
# convert to the HH fluxnet format
fluxnet <- fdk_convert_lsm(
  site = "FR-Fon",
  path = tempdir(),
  fluxnet_format = TRUE,
  out_path = tempdir()
)

Plotting conversion results

You can plot conversion results to quickly inspect the results. Here we retain the data in its original FLUXNET formatting and output the gapfilled and amended data as a data frame. This data frame is input to the plotting routine fdk_plot, which returns an overview plot to the specified out_path directory.

# Read in and convert the data
df <- fdk_convert_lsm(
  site = "FR-Fon",
  path = tempdir(),
  fluxnet_format = TRUE
)

# plot the returned data frame
# as a file
fdk_plot(
  df,
  site = "FR-Fon", # for writing things to file
  out_path = tempdir(),
  overwrite = TRUE
)

Conversions to daily FLUXNET data (downsampling)

Fluxnet data processed to netcdf files can be converted back to FLUXNET CSV based files, with the same column naming conventions as the original files. Data however is downsampled to a daily time step, and additional variables and gap filling is retained from the above LSM based product.

It must be noted that the these daily products, although adhering to the FLUXNET naming conventions (both in filename and column names), are not equivalent to the data generated by the OneFlux processing pipeline.

# Downsample data
fdk_downsample_fluxnet(
  df,
  site = "FR-Fon", # a site name
  out_path = tempdir(),
  overwrite = TRUE
)

rsofun formatting

In addition, MODIS data can be merged from the FluxnetEO dataset using the R package with the same name. The latter ensures that rsofun driver (and target) data are amended with MODIS data for, among others, machine learning projects.

library(rsofun)

# processing of the half hourly data to
# p-model input drivers for rsofun
rsofun_data <- fdk_format_drivers(
  site_info = FluxDataKit::fdk_site_info |>
    filter(sitename == "FR-Fon"),
  path = paste0(tempdir(),"/"),
  verbose = TRUE
)

# optimized parameters from previous work
params_modl <- list(
  kphio           = 0.09423773,
  soilm_par_a     = 0.33349283,
  soilm_par_b     = 1.45602286,
  tau_acclim_tempstress = 10,
  par_shape_tempstress  = 0.0
)

# run the model for these parameters
output <- rsofun::runread_pmodel_f(
  rsofun_data,
  par = params_modl
)

# we only have one site so we'll unnest
# the main model output
model_data <- output$data[[1]][[1]]

print(head(model_data))

validation_data <- rsofun_data |>
  filter(sitename == "FR-Fon") |>
  tidyr::unnest(forcing)

p <- ggplot() +
  geom_line(
    data = model_data,
    aes(
      date,
      gpp
    ),
    colour = "red"
  ) +
  geom_line(
    data = validation_data,
    aes(
      date,
      gpp
    )
  ) +
  labs(
    x = "Date",
    y = "GPP"
  )

print(p)