prepanel | R Documentation |
This function prepares the OTU table and additional datasets for analysis
with the horizonplot()
function.
prepanel(
otudata,
metadata = NA,
taxonomydata = NA,
thresh_prevalence = 80,
thresh_abundance = 0.5,
thresh_abundance_override = NA,
thresh_NA = 5,
regularInterval = NA,
maxGap = NA,
minSamplesPerFacet = 2,
otulist = NA,
subj = NA,
singleVarOTU = NA,
band.thickness = NA,
origin = NA,
facetLabelsByTaxonomy = FALSE,
customFacetLabels = NA,
interpolate_NA = TRUE,
formatStep = FALSE,
nbands = 4
)
otudata |
Data frame representing OTU Table. Assumes first column contains OTU IDs, and all other columns are numeric vectors containing the number of sample reads for each OTU. Values can also be represented as proportions or percentages of the total sample for each OTU. |
metadata |
Data frame representing metadata table; matches samples to
collection dates, and to subject names if applicable. If this data frame is
supplemented, the columns with sample IDs, collection dates and subject
names should be named "sample", "collection_date" and "subject",
respectively. |
taxonomydata |
Taxonomy information for OTUs, used for labeling facets. There are two options:
Taxonomic levels should start from Kingdom and can go as
far as Subspecies. Defaults to |
thresh_prevalence |
numeric threshold for OTU filtering. Minimum % of total samples in which OTU must be present to be included in analysis (defaults to 80). |
thresh_abundance |
numeric threshold for OTU filtering. Minimum % of total sample reads the OTU must constitute to be included in analysis (defaults to 0.5). |
thresh_abundance_override |
numeric threshold for OTU filtering. Minimum
% of total sample reads the OTU must constitute to override all other
standards, and be included in analysis (defaults to |
thresh_NA |
numeric threshold for OTU filtering. Maximum % of samples with missing data (defaults to 5). |
regularInterval |
integer. For regularized data, this specifies the
fixed interval of days separating each sample timepoint. If this value is
20, for example, new timepoints will be created at 1, 21, 41, 61, etc. To
leave data irregularly spaced, do not specify a number here. Defaults to
|
maxGap |
numeric specifying the maximum number of days between the
previous and subsequent irregular timepoints in order to interpolate a new
timepoint. If the distance between the nearest time points exceeds the
threshold specified by |
minSamplesPerFacet |
numeric. For regularized data with breaks in the time axis, specifies the minimum number of samples required of each facet time interval. Facets without this many timepoints will be removed. Defaults to 2. |
otulist |
character vector specifying OTU IDs for manual selection. Also
determines the order from top to bottom of OTU panels displayed on the
horizon plot. Defaults to |
subj |
character, used for datasets with multiple individual
microbiomes. Filter samples to this subject or subjects. In most cases, you
should specify just one subject, but if single OTU analysis is enabled you
can select multiple subjects. Subject names should be described in metadata
under the variable "subject". Defaults to |
singleVarOTU |
character string specifying an OTU ID for facetting by
subject. Facetting by subject requires metadata with columns on sample and
subject, with an equal number of samples for each subject. If collection
dates are provided, they must be identical for each subject. If they are
not provided, the function assumes samples are ordered chronologically. A
subset of subjects may be selected for analysis by supplying a vector of
multiple subjects to |
band.thickness |
The height of each horizontal band (denoted by a unique color), i.e. the size of the scale of a horizon subplot. There are three options:
|
origin |
The baseline (value=0, the base of the first positive band) for horizon subplots. There are three options:
|
facetLabelsByTaxonomy |
If |
customFacetLabels |
Use a custom character vector to label facets. Must
be the same length as the number of OTUs post-filtering, or the number of
subjects if single OTU analysis is enabled. Overrides
facetLabelsByTaxonomy, but if set to |
interpolate_NA |
logical. How should |
formatStep |
If |
nbands |
integer specifying the number of positive bands (each denoted
by a unique color) on each horizon subplot. For example, if you set
|
The prepanel()
function has 6 main purposes in preparing data sets and
other parameters for the main horizonplot()
function:
1) Filter the OTU table to the OTUs displayed on the final horizon plot, and
to the samples of just one individual (for datasets with multiple subjects).
By default, the "most important" OTUs are selected using four filtering
thresholds: thresh_prevalence
, thresh_abundance
,
thresh_abundance_override
, and thresh_NA
. They can also be
manually specified as a vector of OTU IDs using otulist
.
2) If single OTU analysis is enabled, convert the OTU table to values by subject for the OTU being analyzed
3) Ensure data sets are formatted correctly
4) Set the functions for finding the origin
and horizon band thickness
(band.thickness
) of each OTU panel, if the default (NA
) or a
constant is entered.
5) Set other parameters to their defaults, and ensure correct data types are
entered. For boolean values, NA
is converted to FALSE
.
6) Check for common user errors, such as entering ".8" rather than "80" as a percentage filtering threshold (this will leave a warning message).
By default, OTUs are filtered automatically using two thresholds. An
abundance threshold (thresh_abundance
) sets the minimum average
proportion an OTU must represent across all samples, and a prevalence
threshold (thresh_prevalence
) sets the minimum proportion of all
samples where this OTU must be present (at least 1 sample read). These
thresholds can be used in combination, or alone by setting one of them to
0
or NA
.
In addition, you can set a second abundance threshold that overrides the
prevalence threshold if it is reached, using
thresh_abundance_override
. This is useful for catching OTUs that are
abundant for a brief period of time, but are absent from most of the samples,
and are nevertheless important to include in analysis. This is disabled by
default (thresh_abundance_override == NA
).
Finally, a fourth filtering threshold, thresh_NA
, filters out OTUs
with missing data in a substantial fraction of the samples. This defaults to
eliminating OTUs missing data in >5% of samples.
Alternatively, OTUs can be manually specified in otulist
as a vector
of OTU IDs. The order in which these are specified will also determine the
arrangement of OTU panels on the horizon plot.
You can also compare a single OTU across multiple subjects, by specifying the
OTU ID in singleVarOTU
. This is useful for comparing the same
timepoint across multiple individuals, rather than multiple OTUs or taxa.
Returns a list containing the appropriate arguments for the
horizonplot function. This result list should then be inputted into
horizonplot()
to produce the graph. You should not need to alter any
parameters in this list before using them in horizonplot
, but this
preliminary function allows you to check the refined parameters in case of
an error in horizonplot
.
# Pass just the OTU table to prepanel, and it will assume all samples belong
# to the same subject.
prepanel(otusample = otusample_diet)
# Supplement metadata and a subject name, and it will select samples from
# just one subject (this is what you should do with more than one subject).
prepanel(otusample = otusample_diet, metadatasample = metadatasample_diet, subj="MCTs01")
# Pass taxonomydata to prepanel if you want to label facets by taxonomy
# rather than by OTU ID.
prepanel(otusample = otusample_diet, metadatasample = metadatasample_diet,
taxonomydata = taxonomysample_diet, subj="MCTs01", facetLabelsByTaxonomy=TRUE)
# OTU filtering using both a prevalence and an abundance standard (default)
prepanel(otusample = otusample_diet, metadatasample = metadatasample_diet, subj="MCTs01",
thresh_prevalence=75, thresh_abundance=0.75)
# OTU filtering using just an abundance standard
prepanel(otusample = otusample_diet, metadatasample = metadatasample_diet, subj="MCTs01",
thresh_prevalence=NA, thresh_abundance=0.75)
# If an OTU's average abundance reaches a high enough threshold, override
# other standards and include it in analysis
prepanel(otusample = otusample_diet, metadatasample = metadatasample_diet, subj="MCTs01",
thresh_prevalence=90, thresh_abundance=0.75, thresh_abundance_override=1.5)
# Filter OTUs where >2% samples are NA values
prepanel(otusample = otusample_diet, metadatasample = metadatasample_diet, subj="MCTs01",
thresh_NA=2)
# You can also manually select OTUs by OTU ID
prepanel(otusample = otusample_diet, metadatasample = metadatasample_diet, subj="MCTs01",
otulist=c("taxon 1", "taxon 2", "taxon 10", "taxon 14"))
# Manual selection can be used to specify the order OTUs will appear on
# the horizon plot. For example, these two datasets have identical OTUs, but
# they are ordered differently.
params <- prepanel(otusample = otusample_diet, metadatasample = metadatasample_diet,
subj="MCTs01", thresh_prevalence=95, thresh_abundance=1.5,
otulist=c("taxon 1", "taxon 2", "taxon 10", "taxon 14"))
params[[1]]$otuid
params <- prepanel(otusample = otusample_diet, metadatasample = metadatasample_diet,
subj="MCTs01", otulist=c("taxon 10", "taxon 2", "taxon 1", "taxon 14"))
params[[1]]$otuid
# The origin and band.thickness variables can be set to either a numeric
# constant or a function that evaluates separately for every OTU subpanel based
# on its sample values.
# Use a fixed origin of 5% for all OTU subpanels
prepanel(otusample = otusample_diet, metadatasample = metadatasample_diet,
subj="MCTs01", origin=5)
# Evaluate a different origin for each OTU subpanel using a custom function
prepanel(otusample = otusample_diet, metadatasample = metadatasample_diet,
subj="MCTs01", origin=function(y){mad(y, na.rm=TRUE)})
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.