knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This vignette introduces the central data classes in ConFluxPro and shows how to combine all necessary data into a single cfp_dat() object. We start by explaining the concept of a 'profile' and how soil data is structured in the package. We then introduce three essential classes cfp_gasdata(), cfp_soilphys() and cfp_layers_map() that frame different data types within ConFluxPro. Finally, we combine these objects into a single cfp_dat() object that holds all information needed to start modeling soil gas fluxes.

library(ConFluxPro)

Soil data and profiles

What is a soil profile?

A profile in ConFluxPro is a distinct observation of the vertical profile of a soil. For example, this can mean the concentration profile of CO~2~ at a specific time point, or the vertical distribution of soil moisture. Profiles are identified by their id_cols, column names that together uniquely identify each profile. This can be whatever combination of columns happened to be in your data, but will typically include a datetime column to identify the time point or a column that identifies the site if you measured at multiple locations. Consider the soilphys dataset included in the package.

library(dplyr)
data("soilphys", package = "ConFluxPro")
soilphys %>%
  relocate(site, Date, gas) %>%
  head()

This dataset contains information on different soil physical parameters. We will come back to that later. For now, notice that there are the site and Date column. All combinations identify a unique profile, namely twelve dates for two sites "site_a" and "site_b" for a total of 24 profiles. We can create a cfp_profile() object from this dataset. The only thing this does is to add the information which columns identify the profiles to the dataset with the id_cols argument. Internally, the number of distinct profiles is calculated as the unique combinations of the id_cols and can be accessed by n_profiles().

soilphys <- 
  cfp_profile(soilphys, id_cols = c("site", "Date"))
n_profiles(soilphys)

The concept of profiles and id_cols applies to almost all objects created in ConFluxPro. You can always see what id_cols an object has by calling the function cfp_id_cols().

cfp_id_cols(soilphys)

Soil data structure

Within a soil profile, parameters change with depth. In ConFluxPro we differentiate between two types of data. Point data have distinct values at a precise depth in the soil, such as the soil gas concentration. For layered data , the soil is subdivided into layers that are delimited by their upper and lower boundary. Layered data must be without gaps or overlaps, so that the entire profile can be described by distinct values. depth, upper and lower are in centimeters and a higher value indicates the upward direction. Both positive and negative values are allowed. This means you could set the mineral/organic soil boundary to have depth = 0, describe the mineral soil with increasingly negative values of depth and have the organic layer on top have positive values.

We will now present the three main data types that are needed to model gas fluxes.

cfp_gasdata(): Gas concentration data

Soil gas concentration data is stored in a data.frame. The data needs to have at least the columns gas, depth and x_ppm, as well as any id_cols that identify the soil profiles. The gas is a character() that identifies which gas was measured, so for example "CO2". depth is as explained above and x_ppm is the mixing ratio in ppm. Consider the example data that ships with the package:

data("gasdata", package = "ConFluxPro")
head(gasdata)

Because this data.frame already follows the requirements defined above, we can create a cfp_gasdata() object right away by providing the necessary id_cols. The function checks, whether the data.frame meets the criteria and returns an object of class cfp_gasdata.

# create working cfp_gasdata object
my_gasdata <- cfp_gasdata(gasdata, id_cols = c("site", "Date"))

head(my_gasdata)

# won't work, because depth is missing
gasdata_broken <- gasdata %>% select(!depth)
cfp_gasdata(gasdata_broken, id_cols = c("site", "Date"))

The data is in a long format, following the tidy data principles, where each observation is in a new row. If your data is not in this format, you can use the tidyr::pivot_longer() and other functions in the tidyr package. Notice that cfp_gasdata() automatically added the gas column to the id_cols. gas is always a required column in ConFluxPro, so that many functions will automatically add it to the id_cols if its present and will fail if its missing.

cfp_soilphys(): Soil physical data

Create a cfp_soilphys() object

Soil physical data includes everything else needed to describe the gas transport characteristics of the soil. This data is split into homogeneous layers with constant values for each layer and profile. At the minimum, cfp_soilphys() requires two parameters: DS is the gas-specific apparent diffusion coefficient of the soil in m^2^ s^-1^ and c_air is the number density of the soil air in mol m^-3^. If these parameters are provided, a valid cfp_soilphys object can be created. This function ensures that each profile is correctly layered without overlaps or gaps and that the required columns are present.

data("soilphys", package = "ConFluxPro")

my_soilphys <- cfp_soilphys(soilphys, id_cols = c("site", "Date"))

Derive parameters with complete_soilphys()

Often DS and c_air are derived from other parameters that are easier to measure. This then requires a bit more work to get a working cfp_soilphys() object. Typically, you will need the following information:

Measured parameters:

Derived parameters:

These relationships are invoked within the complete_soilphys() function. Consider the provided example data: It already has every parameter we need. Now let's remove the derived parameters and run complete_soilphys() again. We have to provide a formula for the calculation of the relative diffusivity DSD0. In our case, we have fitting parameters a and b from a Currie-style model that calculates as DSD0 = a * AFPS ^ b. We provide this formula as a character() to the function:

soilphys_measured_only <- 
  soilphys %>% 
  select(!all_of(c("AFPS", "DSD0", "D0", "DS", "c_air")))

head(soilphys_measured_only)

my_soilphys_completed <- 
  complete_soilphys(soilphys_measured_only, DSD0_formula = "a*AFPS^b")

head(my_soilphys_completed)

Notice the gas column. This is essential because different gases have specific diffusion coefficients. We can also remove the gas column and provide a list of gases to complete_soilphys():

soilphys_measured_only %>%
  select(!gas) %>%
  complete_soilphys(DSD0_formula = "a*AFPS^b", gases = c("CO2", "CH4")) %>%
  head()

This replicates each row of the data.frame and calculates distinct values for each gas.

From measurements to layered data: discretize_depth()

As seen above, a cfp_soilphys object often requires the combination of many different parameters. These parameters are typically measured in very different ways and may be present in different data formats. This presents a hurdle, as cfp_soilphys() requires these datasets to all be present in the same layered format. The function discretize_depth() helps with this by providing a unified way to interpolate different observations to the same layered structure. The basic idea is that you provide a vector of depths that represent the layer boundaries. For $n$ layers, you would need $n+1$ depths. Let's say we measured a temperature profile at three depths: 0, -30 and -100 cm:

temperature_profile <- 
  data.frame(depth = c(0, -30, -100),
             t = c(25, 20, 16))

If we want to map this to 10 cm-thick layers, we can simply define the layer boundaries and use discretize_depth() to linearly interpolate the observations:

boundaries <- seq(0, -100, by = -10)

discretize_depth(temperature_profile,
                 param = "t", 
                 method = "linear",
                 depth_target = boundaries)

The function can handle profile data and treats every profile distinctly. So if we adapt our example data to two profiles identified by a date column, the interpolation works in the same way, if we provide the necessary id_cols.

temperature_profile_dates <-
  data.frame(date_id = rep(c("date_1", "date_2"), each = 3),
             depth = rep(c(0, -30, -100), times = 2),
             t = c(25, 20, 16, 15, 10, 8)) %>%
  cfp_profile(id_cols = c("date_id"))

discretize_depth(temperature_profile_dates,
                 param = "t", 
                 method = "linear",
                 depth_target = boundaries)

We can use different interpolation methods that may be more applicable to other parameters. See the function documentation ?discretize_depth for details.

cfp_layers_map(): Create the model layers

The final aspect of a ConFluxPro model is the "layers_map". This is a layered data.frame that defines for which layers we want to calculate flux rates. We provide a dataset that matches the other example datasets:

data("layers_map", package = "ConFluxPro")

layers_map

As you can see, we divide each of the site into two layers, one for the organic layer with depth > 0 and one for the mineral soil below. Notice, that the upper boundary is different between the sites. This is because they have a different organic layer, that is also reflected in the other datasets:

my_gasdata %>%
  group_by(site) %>%
  summarise(max_depth = max(depth))

my_soilphys %>%
  group_by(site) %>%
  summarise(max_upper = max(upper))

Notice, that the extends of the layers_map should match your measured data if you want to include it in the models. You cannot extend the layers_map beyond your measurements.

How many layers to define in the cfp_layers_map() is up to the user and depends on the research question and the resolution of the recorded data. While it may be tempting to define a layer between every observation depth in cfp_gasdata(), this may result in artefacts in the modeled fluxes. Combining these layers may therefore yield more robust results.

To create a valid cfp_layers_map, we still need to add some information. These are only needed for a pro_flux() inverse model and are set to NA if missing.

We can either do so in the function call or provide distinct values ourselves. The highlim and lowlim values can help to explicitly exclude unrealisitic processes in the optimization in pro_flux(). For example, we can set lowlim to 0, if we want to prevent consumption of CO~2~. Again, these values are only relevant in the pro_flux() model. In our case, we want to add the limits for the gas "CO2":

my_layers_map <- 
  cfp_layers_map(
    layers_map,
    id_cols = "site",
    gas = "CO2",
    lowlim = 0,
    highlim = 1000
  )

my_layers_map

Note that we only added "site" as id_cols, because we don't need the model frame to change over time and therefore use the same mapping for every Date. You can always use a subset of the id_cols in cfp_soilphys() in your cfp_layers_map(). At the minimum, you need to provide gas within your id_cols.

With this complete, we now have everything to create a unified dataset for a ConFluxPro model.

cfp_dat(): Combining the datasets

As a last step, we combine the three datasets into one unified frame with cfp_dat().

my_dat <- cfp_dat(my_gasdata, my_soilphys, my_layers_map)

my_dat

This function does multiple things. (I) It ensures that the upper and lower boundaries between the datasets match. The highest upper of soilphys and gasdata should match the highest depth in gasdata, with the lower boundary correspondingly. (II) It matches each profile to another and makes a list of all profiles and the corresponding datasets. ConFluxPro does not require the id_cols between the different datasets to match exactly, as long as the different profiles can be mapped to another. This makes it possible, for example, to map the same soilphys profiles to multiple distinct gasdata profiles, without having to replicate the soilphys dataset manually. We already make use of this by providing only one layers_map per site in our example that is not replicated for the different Date of the the other datasets. (III) For each profile, it separates the layers of soilphys, at each depth where there is an observation in gasdata. This does not change the calculation but is needed internally for the pro_flux() model. (IV) It adds the column pmap to the soilphys dataset that indicates which layer, as defined in cfp_layers_map() corresponds to that layer in soilphys.

At its core, a cfp_dat object is simply a list. We can access the different elements, including the generated profiles data.frame like any element in a list with the $ operator.

my_dat$profiles %>% head()

Within this data.frame we have identifiers for each profile of the different datasets (gd_id, sp_id, group_id) and an identifier of each profile of the combined dataset prof_id.

We now are ready model.



valentingar/ConFluxPro documentation built on Dec. 1, 2024, 9:35 p.m.