knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Welcome to the neonSoilFlux
package! This vignette will guide you through the process of using this package to acquire and compute soil CO$_{2}$ fluxes at different sites in the National Ecological Observatory Network.
You can think about this package working in two primary phases:
acquiring the environment data for a given month at a NEON site (acquire_neon_data
). This includes:
a. Soil temperature at different depths.
b. Soil water content at different depths.
c. Soil CO$_{2}$ concentration.
d. Atmospheric pressure
e. Soil properties (bulk density, others)
Given those properties, computing the soil surface fluxes and the associated uncertainty using a variety of methods to compute fluxes (compute_neon_flux
).
We split these two functions in order to optimize time and that both were fundamentally different processes. Acquiring the NEON data makes use of the neonUtilities
package.
This package takes the guess work out of which data products to collect, hoping to reduce the workflow needed. We rely very much on the tidyverse
philosophy for computation and coding here.
Load up the relevant libraries:
library(tidyverse) library(neonSoilFlux)
Let's say we want to acquire the NEON soil data at the SJER
site during the month June in 2021:
out_env_data <- acquire_neon_data(site_name = 'SJER', download_date = '2021-06', )
The output out_env_data
for acquire_neon_data
is a list of lists:
site_data
, a nested data frame containing measurements for the required flux gradient model during the given time period.site_megapit
, a nested frame containing specific information about soils at the site (for bulk density calculations, etc)Two required inputs are needed to run the function acquire_neon_data:
time_frequency
, which is 30 minutes (the default) or the 1 minute data (currently untested) and if we download provisional NEON data.As the data are acquired various messages from the loadByProduct
function from the neonUtilities
package are shown - this is normal. Products are acquired from each spatial location (horizontalPosition
) or vertical depth (verticalPosition
) at a NEON site
Outputs for acquire_neon_data
are two nested data frames:
site_data
This contains three variables: the measurement name (one of soilCO2concentration
, VSWC
(soil water content), soilTemp
(soil temperature), and staPres
(atmospheric pressure)), monthly_mean
contains the mean value of the measurement at each horizontal and vertical depth. We compute the monthly mean using a bootstapped technique. data
which contains the stacked variables acquired from neonUtilities - the horizontal and vertial positions, timestamp (in UTC), associated values, the QF flag (0 = pass, 1 = fail, LINK)site_megapit
: the nested data frame of the soil sampling data, found here (LINK). This data table is essential what is reported back from acquiring the data product from NEON.For each data product, the acquire_neon_data
function also performs two additional checks:
swc_correct
. Information about regarding this correction is found here: LINK. Once updated sensors are installed in the future we will depreciate this function.The monthly mean is utilized when a given measurement fails final QF checks. This function is provided by code from Zoey Werbin. For a location (horizontalPosition
) given depth and A monthly mean is computed when there are at least 15 days of measurements. Assume you have a vector of measurements $\vec{y}$, standard errors $\vec{\sigma}$, and expanded uncertainty $\vec{\epsilon}$ (all of length $M$) that passes the QF checks in a given month. The expanded uncertainty $\vec{\epsilon}$ is generated by NEON to be include the 95% confidence interval. We have that $\vec{\sigma}{i}\leq\vec{\epsilon}{i}$. We define the bias $\vec{b}=\sqrt{\left(\vec{\epsilon}\right)^{2}-\left(\vec{\sigma}\right)^{2}}$ to be the quadrature difference between the expanded uncertainty and the standard error.
We generate a bootstrap sample of the mean $\overline{y}$ and standard error $\overline{s}$ the following ways. Here we set the number of bootstrap samples $N$ to be 5000. Entries for $\overline{y}{i}$ and $\overline{s}{i}$ are determined by the following:
R
will recycle the vector $\vec{y}$ so that this sample is of length $M$. We will call the sample of $\vec{y}$ as $\vec{x}$.Once that is complete, the reported monthly mean and standard deviation is $\overline{\overline{y}}$ and $\overline{s}$.
With the resulting output from acquire_neon_data
, you can then unnest the different data frames to make plots, for example:
# Extract data VSWC_data <- out_env_data$site_data |> filter(measurement == 'VSWC') |> unnest(cols=c("data")) # Plot data VSWC_data |> ggplot(aes(x=startDateTime,y=VSWCMean)) + geom_point(aes(color=as.factor(VSWCFinalQF))) + facet_grid(verticalPosition~horizontalPosition)
Once we have out_env_data
from acquire_neon_flux
, we then compute the fluxes at this site:
out_fluxes <- compute_neon_flux(input_site_env = out_env_data$site_data, input_site_megapit = out_env_data$site_megapit )
The resulting data frame out_fluxes
has the following variables:
startDateTime
: Time period of measurement (as POSIXct)horizontalPosition
: Sensor location where flux is computedflux_compute
: A nested tibble with variables (1) flux
, flux_err
, and method
(one of 4 implemented)diffusivity
: Computation of surface diffusivityVSWCMeanQF
: QF flag for soil water content across all vertical depths at the given horizontal position: 0 = no issues, 1 = monthly mean used in measurement, 2 = QF failsoilTempMeanQF
: QF flag for soil temperature across all vertical depths at the given horizontal position: 0 = no issues, 1 = monthly mean used in measurement, 2 = QF failsoilCO2concentrationMeanQF
: QF flag for soil CO2 concentration across all vertical depths at the given horizontal position: 0 = no issues, 1 = monthly mean used in measurement, 2 = QF failstaPresMeanQF
: QF flag for atmospheric pressure across all vertical depths at the given horizontal position: 0 = no issues, 1 = monthly mean used in measurement, 2 = QF failA QF measurement fails when there is a monthly mean could not be computed for a measurement. Note that this would cause all flux calculations to fail at that given horizontal position.
You can see the distribution the QF flags for each environmental measurement with env_fingerprint_plot
:
env_fingerprint_plot(out_fluxes)
Similarly, you can see the distribution of QF flags for each diffusivity and flux computation with flux_fingerprint_plot
:
flux_fingerprint_plot(out_fluxes)
To plot the flux results:
out_fluxes |> select(-diffusivity) |> unnest(cols=c(flux_compute)) |> ggplot(aes(x=startDateTime,y=flux,color=method)) + geom_line() + facet_wrap(~horizontalPosition,scales = "free_y")
The diffusivity can be plotted similarly:
out_fluxes |> select(-flux_compute) |> unnest(cols=c(diffusivity)) |> ggplot(aes(x=startDateTime,y=diffusivity,color=as.factor(zOffset))) + geom_line() + facet_wrap(~horizontalPosition,scales = "free_y")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.