Introduction to Fluxtools"
In fluxtools: A 'shiny' App for Reproducible QA/QC of Eddy Covariance Data

library(fluxtools)

Overview

fluxtools is an R package that provides an interactive Shiny‐based QA/QC environment for data in the AmeriFlux BASE format. In just a few clicks, you can:

Upload eddy covariance data in a .csv format (AmeriFlux standard naming and timestamp conventions)
Visualize any two numeric columns against time (or each other)
Highlight statistical outliers (±σ from a linear fit) and add them to your point-removal R code
Manually select and remove data points via a lasso or box. Selecting these adds to the accumulated removal code
Copy and paste the generated code into your own R script for reproducible QA/QC
Download a “cleaned” CSV with excluded values (using "apply removals") set to NA and an R script for reproducibility

This vignette shows you how to install, launch, and use the main Shiny app—run_flux_qaqc()—and walks through a typical workflow.

Installation

You can install fluxtools from CRAN, or directly from GitHub:

# Install from CRAN 
install.packages("fluxtools")

# Install from GitHub
library(devtools) 
devtools::install_github("kesondrakey/fluxtools")

Launching the Shiny App

Load fluxtools and launch the QA/QC application:

library(fluxtools)

# Run the app
run_fluxtools()

Example workflow

Upload: Select your AmeriFlux-style CSV (e.g., US_VT1_HH_202401010000_202501010000.csv). Files can be up to 500MB (larger file sizes might be harder on the Shiny interface)
Choose Year(s): By default “all” is selected, but you can subset to specific years
Choose variables: TIMESTAMP_START is on the x-axis by default. Change the y-axis to your variable of interest (e.g., FC_1_1_1). The generated R code focuses on removing the y-axis variable
Select data: Use the box or lasso to select points. This populates the “Current” code box with something like:

r df <- df %>% mutate( FC_1_1_1 = case_when( TIMESTAMP_START == '202401261830' ~ NA_real_, TIMESTAMP_START == '202401270530' ~ NA_real_, … TRUE ~ FC_1_1_1 ) )

Flag data and Accumulate code: With points still selected, click “Flag data.” Selected points turn orange, and code is appended to the “Accumulated” box, allowing multiple selections per session.
Unflag data: Use the box or lasso to de-select points and remove from the Accumulated code box.
Clear Selection: To reset all selections from the current y-variable, click "Clear Selection" to reset the current view.
Switch variables: Change y to any other variable (e.g., SWC_1_1_1) and select more points. Click “Flag data” Code for both variables to appear:

```r df <- df %>% mutate( FC_1_1_1 = case_when( TIMESTAMP_START == '202401261830' ~ NA_real_, TIMESTAMP_START == '202401270530' ~ NA_real_, … TRUE ~ FC_1_1_1 ) )

df <- df %>% mutate( SWC_1_1_1 = case_when( TIMESTAMP_START == '202403261130' ~ NA_real_, TIMESTAMP_START == '202403270800' ~ NA_real_, … TRUE ~ SWC_1_1_1 ) ) ```

Compare variables: Change to variables you would like to compare (e.g., change y to TA_1_1_1 and x to T_SONIC_1_1_1). The app computes an R² via simple linear regression. The top R² is based on points before removals, and once data is selected, a second R² will pop up - calculating the linear regression assuming the selected points have been removed
Highlight outliers: Use the slider to select ±σ residuals. Click “Select all ±σ outliers” to append them to the Accumulated code. Click “Clear ±σ outliers” to deselect and remove from the code box
Copy all: Click the Copy Icon to the right of the current or accumulated code box and paste into your own R script for documentation
Apply Removals: Click “Apply Removals” to remove each selected data points, from the current y-variable, to replace points with NA in a new .csv (raw data is unaffected), available using 'export cleaned data' and remove these values from view
Reload original data: Make a mistake or want a fresh start? Click Reload original data to reload the .csv from above to start over
Export cleaned data: Download the cleaned .csv reflecting your confirmed removals. This button will download a zip file containing your .csv, reflecting changes from using the "apply removals" button, and includes a compiled R script with the R code for those removals.

Physical Boundary Module (PRM) function:

The Physical Range Module (PRM) removes out-of-range values to NA based on similar variables using patterns like ^SWC($|_) or ^P($|_).
Columns containing "QC" are skipped by default. No columns are removed.

Source of ranges: AmeriFlux Technical Documents, Table A1 (Physical Range Module).

Quick start

# tiny demo dataset with a few out-of-range values
set.seed(1)
df <- tibble::tibble(
  TIMESTAMP_START = seq.POSIXt(as.POSIXct("2024-01-01", tz = "UTC"),
                               length.out = 10, by = "30 min"),
  SWC_1_1_1 = c(10, 20, 150, NA, 0.5, 99, 101, 50, 80, -3),  # bad: 150, 101, -3; 0.5 triggers SWC unit note
  P         = c(0, 10, 60, NA, 51, 3, 0, 5, 100, -1),        # bad: 60, 51, 100, -1
  RH_1_1_1  = c(10, 110, 50, NA, 0, 100, -5, 101, 75, 30),   # bad: 110, -5, 101
  SWC_QC    = sample(0:2, 10, replace = TRUE)                # QC col should be ignored
)

# To see the Physical Boundary Module (PRM) rules:
get_prm_rules()

#Apply filter to all relevant variables
res <- apply_prm(df)

# PRM summary (counts and % replaced per column)
res$summary

# Only set range for SWC 
df_filtered_swc <- apply_prm(df, include = "SWC")

# Only set range for SWC + P 
df_filtered_swc_P <- apply_prm(df, include = c("SWC", "P"))

Physical Range Module Values

library(dplyr)
library(knitr)

rules_tbl <- get_prm_rules() |>
  transmute(
    Variable    = variable,
    Description = description,
    Units       = units,
    'Min to Max' = sprintf("%s to %s",
                           ifelse(is.na(min), "NA", min),
                           ifelse(is.na(max), "NA", max))
  ) |>
  arrange(Variable)

note_txt <- if (knitr::is_latex_output()) {
  "\\emph{Source: AmeriFlux Technical Documents}"
} else {
  "<em>Source: AmeriFlux Technical Documents</em>"
}

use_kableExtra <- requireNamespace("kableExtra", quietly = TRUE)
tbl <- fluxtools::get_prm_rules()
if (use_kableExtra) {
  kableExtra::kbl(tbl, booktabs = TRUE) |>
    kableExtra::kable_styling(full_width = FALSE)
} else {
  knitr::kable(tbl)
}


# kbl(rules_tbl,
#     caption = "Physical Range Module (PRM) bounds",
#     booktabs = TRUE, escape = FALSE) |>
#   kable_styling(full_width = FALSE, latex_options = c("threeparttable")) |>
#   footnote(general = note_txt, general_title = "", escape = FALSE)

Fluxtools is an independent project and is not affiliated with or endorsed by the AmeriFlux Network. “AmeriFlux” is a registered trademark of Lawrence Berkeley National Laboratory and is used here for identification purposes only.