evaluate <- ifelse(
  Sys.info()['nodename'] == "balder" | 
  Sys.info()['nodename'] == "dash" |
  Sys.info()['nodename'] == "pop-os",
  TRUE,
  FALSE
  )
library(dplyr)
library(ggplot2)
library(lubridate)
library(readr)
library(FluxDataKit)

Note this routine will only run with the appropriate files in the correct place. Given the file sizes involved no demo data can be integrated into the package.

Time series

For time series modelling and analysing long-term trends, we want complete sequences of good-quality data. Data gaps are not an option as they would break sequences. We have to allow for the reduced quality of gap-filled data. But we have to set limits. FluxDataKit contains information about the start and end dates of complete sequences good-quality gap-filled time series for each site for variables gross primary production, the latent heat flux, and the energy-balance corrected latent heat flux. This information is provided by the object fdk_site_fullyearsequence.

This vignette demonstrates an example for how to use this information for obtaining good-quality daily GPP time series data for which the following criteria apply:

Information considering points 1-3 are contained in the data frame fdk_site_fullyearsequence which is provided as part of the FluxDataKit package.

Determine sites

Apply criteria listed above.

sites <- FluxDataKit::fdk_site_info |>
  filter(!(sitename %in% c("MX-Tes", "US-KS3"))) |>  # failed sites
  filter(!(igbp_land_use %in% c("CRO", "WET"))) |> 
  left_join(
    fdk_site_fullyearsequence,
    by = "sitename"
  ) |> 
  filter(!drop_gpp) |>  # where no full year sequence was found
  filter(nyears_gpp >= 3)

This yields 142 sites with a total of 1208 years' data, corresponding to 399,310 data points of daily data.

# number of sites
nrow(sites)

# number of site-years
sum(sites$nyears_gpp)

# number of data points (days)
sum(sites$nyears_gpp) * 365

Load data

Read files with daily data.

path <- "~/data/FluxDataKit/FLUXDATAKIT_FLUXNET/"  # adjust for your own local use

read_onesite <- function(site, path){
  filename <- list.files(path = path, 
                        pattern = paste0("FLX_", site, "_FLUXDATAKIT_FULLSET_DD"), 
                        full.names = TRUE
                        )
  out <- read_csv(filename) |> 
    mutate(sitename = site)
  return(out)
}

# read all daily data for the selected sites
ddf <- purrr::map_dfr(
  sites$sitename,
  ~read_onesite(., path)
)

Select data sequences

Cut to full-year sequences of good-quality data.

ddf <- ddf |>
  left_join(
    sites |> 
      select(
        sitename, 
        year_start = year_start_gpp, 
        year_end = year_end_gpp),
    by = join_by(sitename)
  ) |> 
  mutate(year = year(TIMESTAMP)) |> 
  filter(year >= year_start & year <= year_end) |> 
  select(-year_start, -year_end, -year)

Plot

Visualise data availability.

sites |> 
  select(
    sitename, 
    year_start = year_start_gpp, 
    year_end = year_end_gpp) |> 
  ggplot(aes(y = sitename, 
             xmin = year_start, 
             xmax = year_end)) +
  geom_linerange() +
  theme(legend.title = element_blank(),
        legend.position="top") +
  labs(title = "Good data sequences",
       y = "Site")


computationales/flux_data_kit documentation built on Feb. 23, 2025, 8:19 p.m.