horizonplot: Construct a Microbiome Horizon Plot

View source: R/BiomeHorizon.R

horizonplotR Documentation

Construct a Microbiome Horizon Plot

Description

This is the main function that constructs and returns the microbiome horizon plot.

Usage

horizonplot(parameterList, aesthetics = horizonaes())

Arguments

parameterList

The list of parameters for constructing the horizon plot. This should come directly from the output list of the prepanel() function, without alteration. The 15 parameters, in order, are otudata, taxonomydata, timestamps, otulist, subj, regularInterval, maxGap, minSamplesPerFacet band.thickness, origin, facetLabelsByTaxonomy, customFacetLabels, fill_NA, nbands, and formatStep.

Two of the parameters are not arguments of the prepanel() function, and are described below:

timestamps is an integer vector containing the time (days) each sample was collected, retrieved from the collection_date variable of metadata. The first element is designated as day 1.

fill_NA is the function that fills missing data in otudata, based on the boolean value interpolate_NA supplied to prepanel(). This will be set to either assign missing data values to zero (interpolate_NA == TRUE) or interpolate them using adjacent data within the same OTU (interpolate_NA == FALSE). (Note: missing data in this case does not include entire timepoints without data, in which case data is either interpolated or a break in the time axis is created.)

aesthetics

A list of custom aesthetics to apply to the horizon plot. This should come directly from the output list of horizonaes(), without alteration.

Details

After data sets and other parameters have been properly formatted and checked for errors in the prepanel function, they are entered into this function, which constructs and returns the horizon plot. All customizations of the graph should be specified in prepanel() and not here; no alteratons should be made to the output list before it is entered into this function.

The refined version of otudata used in this function represents a filtered OTU table, containing only the OTUs to be displayed on the graph, and only the samples belonging to the subject selected. Sample values in this refined table reflect difference in fractional abundance from the origin value. Values are converted from raw sample reads to proportions of the entire sample represented by a given OTU, and then the proportion values within each OTU are centered to their respective origin values.

The refined taxonomydata is filtered to just the OTUs in otudata.

Value

Returns the horizon plot as a ggplot object.

Irregular Data

A common problem faced in visualizing time series data is plotting data spaced at irregular time intervals. A common solution for this problem is interpolating values at regular time intervals using nearby data. However, since microbiome data can change drastically in short periods of time, it doesn't make sense to interpolate through large timespans, and thus the function gives several options to plot irregular data as accurately as possible.

  1. Plot "real values" but with an inaccurate timescale. For this default option, samples will be plotted next to each other regardless of their timestamps. This is most accurate in that timepoints are plotted directly from sample values, but risks being misleading if the timescale is not clearly marked as inconsistent. Additionally, this option removes the ability to visually compare temporal differences within the same plot.

  2. Plot artificial values but with a regularized timescale. New values can be interpolated using existing ones at a regular interval of time specified by regularInterval. The first sample is plotted as "day 1," and a new value is interpolated using the closest previous and subsequent sample timepoints at a fixed interval throughout the rest of the data. This "regularization" of the data allows for quick visual comparison of microbiome changes within the plot. The downside of this method is that "real" values are not plotted (except for rare cases where a sample timepoint happens to fall on the regular interval), and innaccuracies are created through interpolation. This is especially true given the continuous, rapid changes of bacterial abundances within the microbiome.

  3. Compromise between accuracy of values and a regular timescale: interpolation within clusters of closely-spaced data, which are separated by breaks in the time axis. This allows for temporal comparison within each cluster of timepoints and avoids interpolating across large timespans. This method is practical for datasets where samples are collected irregularly, arranged in periodic clusters of closely-spaced data separated by larger timespans with fewer samples. Clustering is done by specifying a value for maxGap, which defines the threshold of time without data to separate clusters.

Examples

# Basic plot form. By default, samples are plotted next to each other.
plist <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet,
taxonomysample = taxonomysample_diet, subj = "MCTs16")
horizonplot(plist)

# For irregularly spaced time series, you can "regularize" the data to create
# an accurate timescale.

# Adjust data to regular time intervals each 1 day. This will interpolate
# new data points for each OTU at day = 1, 2, 3 etc. based on values
# at previous and subsequent timepoints.
plist <- prepanel(otudata = otusample_diet, metadata = metadatasample_diet, 
                  subj = "MCTs16", regularInterval = 1)
horizonplot(plist)

# If the data has large gaps of time without samples, interpolating data
# within these time intervals could be misleading. You can set a maximum
# amount of time without samples allowed to plot a timepoint. If a timepoint
# is eliminated, a break in the time axis will be created at that point, and
# data will be regularized separately on both sides of the break in two
# different facets.

# Set maximum time without samples to 75 days
plist <- prepanel(otudata = otusample_baboon, metadata = metadatasample_baboon, 
                  subj = "Baboon_388", regularInterval = 25, maxGap = 75)
horizonplot(plist)

# Remove facets with less than 5 samples
plist <- prepanel(otudata = otusample_baboon, metadata = metadatasample_baboon, 
                  subj = "Baboon_388", regularInterval = 25, maxGap = 75, 
                  minSamplesPerFacet = 5)
horizonplot(plist)


blekhmanlab/biomehorizon documentation built on Nov. 8, 2023, 12:16 a.m.