calc_excess: Calculation of excess atom fraction
In bramstone/qsip: Quantitative stable isotope probing for microbial ecology

View source: R/calc_excess.R

calc_excess

R Documentation

Calculation of excess atom fraction

Description

Calculates fractional isotope incorporation in excess of natural abundances

Usage

calc_excess(
  data,
  tax_id = c(),
  sample_id = c(),
  wads = "wad",
  iso_trt = c(),
  isotope = c(),
  bootstrap = FALSE,
  iters = 999L,
  grouping_cols = c(),
  min_freq = 3,
  correction = TRUE,
  rm_outliers = TRUE,
  non_grower_prop = 0.1,
  total_enrich = 1,
  wad_to_gc_slope = 0.083506,
  wad_to_gc_intercept = 1.646057,
  nat_abund_13C = 0.01111233,
  nat_abund_15N = 0.003663004,
  nat_abund_18O = 0.002011429
)

Arguments

`data`	Data as a long-format data.table where each row represents a taxonomic feature within a single fraction. Typically, this is the output from the `calc_wad` function.
`tax_id`	Column name specifying unique identifier for each taxonomic feature.
`sample_id`	Column name specifying unique identifier for each replicate.
`wads`	Column name specifying weighted average density values. It is possible to change this if preferred but not recommended.
`iso_trt`	Column name specifying a two-level categorical column indicating whether a sample has been amended with a stable isotope (i.e., is "heavy") or if isotopic composition is at natural abundance (i.e., "light"). Any terms may be applied but care should be taken for these values. If supplied as a factor, `calc_excess` will take the lowest level as the "light" treatment and the higher level as the "heavy" treatment. Alternatively, if supplied as a character, `calc_excess` will coerce the column to a factor and with the default behavior wherein the first value in alphabetical order will be assumed to be the lowest factor level (i.e. the "light" treatment).
`isotope`	Column name specifying the isotope applied to each replicate. For "heavy" samples, these values should be one of "13C", "15N", or "18O".
`bootstrap`	Whether to generate bootstrapped enrichment values for each taxonomic feature across groups of samples. If `TRUE`, replicates within specified treatment groupings will be randomly resampled with replacement and resulting WAD values used to generate a distribution of enrichment values for each taxon.
`iters`	Integer specifying the number of bootstrap iterations to perform.
`grouping_cols`	Column(s) used to indicate bootstrap resampling groups. Within each group, replicates will be resampled with replacement. Resampling will not occur across groups.
`min_freq`	Minimum number of replicates a taxonomic feature must occur in to be kept. If treatment grouping columns are specified, frequencies will be assessed at this level. For unlabeled "light" samples, treatment groupings will be ignored when assessing adequate frequency of occurrence.
`correction`	Whether to apply a correction to fractional enrichment values to ensure a certain proportion are positive.
`non_grower_prop`	Fractional value applied if `correction == TRUE` specifying the proportion of the community in each samples assumed to be non-growers and whose median enrichment values will be assumed to be zero. The adjustment necessary to place this median value at zero will be applied as a correction to all enrichment values in the sample.
`total_enrich`	Argument of length one describing the total enrichment value of the target isotope(s) achieved. The final fractional isotope enrichment calculations will be divided by this amount. A numeric value should be a positive number less than one and will be applied to every sample. A character value indicates a column with one or more total enrichment values, so that sample- or group-specific total enrichment values may be applied.
`wad_to_gc_slope`	Single numeric value indicating the slope of the relationship between WAD values and estimated genomic GC content. The default value represents that calculated by Hungate et al. 2015.
`wad_to_gc_intercept`	Single numeric value indicating the intercept of the relationship between WAD values and estimated genomic GC content. The default value represents that calculated by Hungate et al. 2015.
`nat_abund_13C`	Single numeric value indicating the default assumption for background 13C. See `details`.
`nat_abund_15N`	Single numeric value indicating the default assumption for background 15N See `details`.
`nat_abund_18O`	Single numeric value indicating the default assumption for background 18O. See `details`.
`rm_outlers`	Whether or not to remove low fractional enrichment values that are 1.5X lesser than the distance between the 25th quantile and interquartile range. If `bootstrap = TRUE`, outlier WAD values will be removed prior to resampling and enrichment calculation.

Details

calc_excess automatically averages the isotopically unamended WAD values for each taxonomic feature on the assumption that density values will be identical (or nearly identical) for those samples.

The equations for calculating the molecular weights of taxon i, designated M_{Lab,i} for labeled and M_{Light,i} for unlabeled, are:

M_{Light,i} = 0.496 \cdot G_{i} + 307.691

M_{Lab,i} = \left( \frac{\Delta W}{W_{Light,i}} + 1 \right) \cdot M_{Light,i}

Where

G_{i} = \frac{1}{0.083506} \cdot (W_{Light,i} - 1.646057)

Which indicates the GC content of taxon i based on the density of its DNA when unlabeled. Note that the slope and intercept in this equation may be changed using the wad_to_gc_slope, and wad_to_gc_intercept parameters.

The calculation for the fractional of enrichment of taxon i, A_{i} is:

A_{i} = \frac{M_{Lab,i} - M_{Light,i}}{M_{Heavymax,i} - M_{Light,i}} \cdot (1 - N_{x})

Where

N_{x}: The natural abundance of heavy isotope. Default estimates are: N_{18O} = 0.002000429, N_{13C} = 0.01111233, and N_{15N} = 0.003663004

M_{Heavymax,i}: The highest theoretical molecular weight of taxon i assuming maximum labeling by the heavy isotope

M_{O,Heavymax,i} = (12.07747 + M_{Light,i}) \cdot L

M_{C,Heavymax,i} = (-0.4987282 \cdot G_{i} + 9.974564 + M_{Light,i}) \cdot L

M_{N,Heavymax,i} =( 0.5024851 \cdot G_{i} + 3.517396 + M_{Light,i}) \cdot L

L: The maximum label possible based off the percent of heavy isotope making up the atoms of that element in the labeled treatment

Value

calc_excess returns a data.table where each row represents a taxonomic feature within a single replicate. The following columns are produced: excess atom fraction (eaf). NA values (usually when an organism is present in only the unlabeled or labeled samples) are removed.

Because fraction-level data are being condensed to replicate-level, a list of grouping columns to keep is not necessary unless bootstrapping is specified, wherein those groupings will be used to organize the resampling effort.

Bootstrap functionality produces the columns: 2.5%, 50%, and 97.5% reflecting the median bootstrapped enrichment value and the 95% confidence interval of enrichment values. In addition, probability that a taxon's enrichment was less than zero is represented in the p_val column which is based on the proportion sub-zero bootstrapped observations.

If iso_trt is not a factor, then it will be coerced to one and it will return a message telling the user which value was assigned to which isotope labeling level. Most users adopt a scheme of using the terms "light" and "label" to define their treatments but please note that the word "label" comes first in the alphabet and will be assigned by the function to be the unlabeled or light treatment. If you wish to avoid this (you almost certainly do), you must specify your factor levels beforehand.

References

Hungate, Bruce, et al. 2015. Quantitative microbial ecology through stable isotope probing. Applied and Environmental Microbiology 81: 7570 - 7581.

Morrissey, Ember, et al. 2018. Taxonomic patterns in nitrogen assimilation of soil prokaryotes. Environmental Microbiology 20: 1112 - 1119.

Examples

 # Load in example data
 data(example_qsip)

 # relativize sequence abundances (should be done after taxonomic filtering)
 example_qsip[, rel_abund := seq_abund / sum(seq_abund), by = sampleID]

 # ensure that the "light" treatment is the first factor level in the isotope treatment column
 levels(example_qsip$iso_trt)

 # calculate weighted average densities
 wads <- calc_wad(example_qsip,
                  tax_id = 'asv_id', sample_id = 'sampleID', frac_id = 'fraction',
                  frac_dens = 'Density.g.ml', frac_abund = 'avg_16S_g_soil',
                  rel_abund = 'rel_abund',
                  grouping_cols = c('treatment', 'isotope', 'iso_trt', 'Phylum'))

 # calculate fractional enrichment in excess of background
 eaf <- calc_excess(wads, tax_id = 'asv_id', sample_id = 'sampleID',
                    iso_trt = 'iso_trt', isotope = 'isotope')

bramstone/qsip documentation built on Feb. 9, 2025, 5:05 p.m.