amp_load: Load data for ampvis2 functions

View source: R/amp_load.R

amp_loadR Documentation

Load data for ampvis2 functions

Description

This function reads an OTU-table and corresponding sample metadata, and returns a list for use in all ampvis2 functions. It is therefore required to load data with amp_load before any other ampvis2 functions can be used.

Usage

amp_load(
  otutable,
  metadata = NULL,
  taxonomy = NULL,
  fasta = NULL,
  tree = NULL,
  pruneSingletons = FALSE,
  removeAbsentOTUs = TRUE,
  ...
)

Arguments

otutable

(required) File path, data frame, or a phyloseq-class object. OTU-table with the read counts of all OTU's. Rows are OTU's, columns are samples, otherwise you must transpose. The taxonomy of the OTU's can be placed anywhere in the table and will be extracted by name (Kingdom/Domain -> Species). If a file path is provided it will be attempted being read by either fread or read_excel, respectively. Compressed files (zip, bzip2, gzip) are supported if not an excel file (bzip2 and gzip requires data.table 1.14.3 or later). Can also be a path to a BIOM file, which will then be parsed using the biomformat package, so both the JSON and HDF5 versions of the BIOM format are supported.

metadata

(recommended) File path or a data frame. Sample metadata with any information about the samples. The first column must contain sample ID's matching those in the otutable. If none provided, dummy metadata will be created. Can be a data frame, matrix, or path to a delimited text file or excel file which will be read using either fread or read_excel, respectively. Compressed files (zip, bzip2, gzip) are supported if not an excel file (bzip2 and gzip requires data.table 1.14.3 or later). If otutable is a BIOM file and contains sample metadata, metadata will take precedence if provided. (default: NULL)

taxonomy

(recommended) File path or a data frame. Taxonomy table where rows are OTU's and columns are up to 7 levels of taxonomy named Kingdom/Domain->Species. If taxonomy is also present in otutable, it will be discarded and only this will be used. Can be a data frame, matrix, or path to a delimited text file or excel file which will be read using either fread or read_excel, respectively. Compressed files (zip, bzip2, gzip) are supported if not an excel file (bzip2 and gzip requires data.table 1.14.3 or later). Can also be a path to a .sintax taxonomy table from a USEARCH analysis pipeline, file extension must be .sintax. bzip2 or gzip compression is currently NOT supported if sintax format. (default: NULL)

fasta

(optional) Path to a FASTA file with reference sequences for all OTU's in the OTU-table. (default: NULL)

tree

(optional) Path to a phylogenetic tree file which will be read using read.tree, or an object of class "phylo". (default: NULL)

pruneSingletons

(logical) Remove OTU's only observed once in all samples. (default: FALSE)

removeAbsentOTUs

(logical) Remove OTU's with 0 abundance in all samples. Absent OTUs are rarely in the input data itself, but can occur when some samples are removed because of a mismatch between samples in the OTU-table and sample metadata. (default: TRUE)

...

(optional) Additional arguments are passed on to any of the file reader functions used.

Details

The amp_load function validates and corrects the provided data frames in different ways to make it suitable for the rest of the ampvis2 functions. It is important that the provided data frames match the requirements as described in the following sections to work properly. If a phyloseq-class object is provided the metadata, taxonomy, fasta, and tree arguments are ignored as they are expected to be provided in the phyloseq object.

Value

A list of class "ampvis2" with 3 to 5 elements.

The OTU-table

The OTU-table contains information about the OTUs, their read counts in each sample, and optionally their assigned taxonomy. The provided OTU-table must be a data frame with the following requirements:

  • The rows are OTU IDs and the columns are samples.

  • The last 7 columns are optionally the corresponding taxonomy assigned to the OTUs, named "Kingdom", "Phylum", "Class", "Order", "Family", "Genus", "Species".

  • The OTU ID's are expected to be in either the row names of the data frame or in a column called "OTU", "ASV", or "#OTU ID". Otherwise the function will stop with a message.

  • The column names of the data frame are the sample IDs, exactly matching those in the metadata, (and taxonomy columns named Kingdom -> Species if present, of course).

  • Generally avoid special characters and spaces in row- and column names.

A minimal example is available with data("example_otutable").

The metadata

The metadata contains additional information about the samples, for example where each sample was taken, date, pH, treatment etc, which is used to compare and group the samples during analysis. The amount of information in the metadata is unlimited, it can contain any number of columns (variables), however there are a few requirements:

  • The sample IDs must be in the first column. These sample IDs must match exactly to those in the OTU-table.

  • Column classes matter, categorical variables should be loaded either as.character() or as.factor(), and continuous variables as.numeric(). See below.

  • Generally avoid special characters and spaces in row- and column names.

If for example a column is named "Year" and the entries are simply entered as numbers (2011, 2012, 2013 etc), then R will automatically consider these as numerical values (as.numeric()) and therefore the column as a continuous variable, while it is a categorical variable and should be loaded as.factor() or as.character() instead. This has consequences for the analysis as R treats them differently. Therefore either use the colClasses = argument when loading a csv file or col_types = when loading an excel file, or manually adjust the column classes afterwards with fx metadata$Year <- as.character(metadata$Year).

The amp_load function will automatically use the sample IDs in the first column as row names, but it is important to also have an actual column with sample IDs, so it's possible to fx group by that column during analysis. Any unmatched samples between the otutable and metadata will be removed with a warning.

A minimal example is available with data("example_metadata").

Author(s)

Kasper Skytte Andersen ksa@bio.aau.dk

Mads Albertsen MadsAlbertsen85@gmail.com

See Also

amp_load, amp_filter_samples, amp_filter_taxa

Examples


library(ampvis2)
## Not run: 
# Load data by either giving file paths or by passing already loaded R objects
### example load with file paths
d <- amp_load(
  otutable = "path/to/otutable.tsv",
  metadata = "path/to/metadata.xlsx",
  taxonomy = "path/to/taxonomy.txt"
)

### example load with R objects
# Read the OTU-table as a data frame. It is important to set check.names = FALSE
myotutable <- read.delim("data/otutable.txt", check.names = FALSE)

# Read the metadata, probably an excel sheet
mymetadata <- read_excel("data/metadata.xlsx", col_names = TRUE)

# Read the taxonomy
mytaxonomy <- read.csv("data/taxonomy.csv", check.names = FALSE)

# Combine the data with amp_load()
d <- amp_load(
  otutable = myotutable,
  metadata = mymetadata,
  taxonomy = mytaxonomy,
  pruneSingletons = FALSE,
  fasta = "path/to/fastafile.fa", # optional
  tree = "path/to/tree.tree" # optional
)

### Load a phyloseq object
d <- amp_load(physeq_object)

### Show a short summary about the data by simply typing the name of the object in the console
d

## End(Not run)

### Minimal example metadata:
data("example_metadata")
example_metadata

### Minimal example otutable:
data("example_otutable")
example_otutable

### Minimal example taxonomy:
data("example_taxonomy")
example_taxonomy

# load example data
d <- amp_load(
  otutable = example_otutable,
  metadata = example_metadata,
  taxonomy = example_taxonomy
)

# show a summary of the data
d

MadsAlbertsen/ampvis2 documentation built on Jan. 28, 2024, 7:12 a.m.