as.seqData: Convert Data to Appropriate pmartRseq Class

Description Usage Arguments Details Author(s)

Description

Converts a list object or several data.frames of rRNA (16S/ITS/18S), metatranscript, or metagenomic data to an object of the class 'seqData'. Objects of the class 'seqData' are lists with two obligatory components e_data and f_data. An optional list component e_meta is used if analysis or visualization at other levels (e.g. taxonomy) is also desired.

Usage

1
2
as.seqData(e_data, f_data, e_meta = NULL, edata_cname, fdata_cname, data_type,
  taxa_cname = NULL, ...)

Arguments

e_data

a p \times n + 1 data.frame of expression data, where p is the number of features observed and n is the number of samples (an additional feature identifier/name column should also be present anywhere in the data.frame). Each row corresponds to data for each feature.

f_data

a data.frame with n rows. Each row corresponds to a sample with one column giving the unique sample identifiers found in e_data column names and other columns providing qualitative and/or quantitative traits of each sample.

e_meta

an optional data.frame with p rows. Each row corresponds to a feature with one column giving identifiers (must be named the same as the column in e_data) and other columns giving meta information (e.g. mappings of OTU identification to taxonomy).

edata_cname

character string specifying the name of the column containing the identifiers in e_data and e_meta (if applicable).

fdata_cname

character string specifying the name of the column containing the sample identifiers in f_data.

data_type

character string specifying if this is 'rRNA' (for 16S/ITS/18S), 'metagenomic', or 'metatranscriptomic' data.

taxa_cname

optional character string specifying the name of the column containing the taxonomy in e_meta (if applicable). Defaults to NULL. If e_meta is NULL, then specify taxa_cname as NULL.

...

further arguments

e_tree

an optional NEXUS or Newick formatted phylogenetic tree file, imported using ape::read.tree(tree_path). The OTU labels in the tree file should match the OTU identifiers in the preceeding data fields.

e_seq

an optional fasta formatted representation of biological sequences imported using Biostrings::readDNAStringSet(fasta_path, ...). Each OTU in the fasta maps to at least one sequence in the preceeding data fields.

ec_cname

optional character string specifying the name of the column containing the EC numbers in e_meta (if applicable). Defaults to NULL. If e_meta is NULL, then specify ec_cname as NULL.

gene_cname

optional character string specifying the name of the column containing the gene names in e_meta (if applicable). Defaults to NULL. If e_meta is NULL, then specify gene_cname as NULL.

Details

Objects of class 'seqData' contain some attributes that are referenced by downstream functions. These attributes can be changed from their default value by manual specification. A list of these attributes as well as their default values are as follows:

data_scale Scale of the data provided in e_data. Acceptable values are 'log2', 'log10', 'log', 'count', and 'abundance', which indicate data is log base 2, base 10, natural log transformed, raw count data, and raw abundance, respectively. Default values is 'count'.
data_norm A logical argument, specifying whether the data has been normalized or not. Default value is 'FALSE'.
norm_method Null if data_norm is FALSE. If data_norm is TRUE, character string defining which normalization method was used. Default value is 'NULL'.
location_param NULL if there are no location parameters from normalization, otherwise a vector detailing the normalization location parameters for each sample.
scale_param NULL if there are no scale parameters from normalization, otherwise a vector detailing the normalization scale parameters for each sample.
seq_type Character string describing the type of sequencer (e.g. 'HiSeq'). Default value is 'NULL'.
db Character string describing which database was used to process the data (e.g. "TIGR"). Default value is 'NULL'.
db_version Character string describing which version of the database was used. Default value is 'NULL'. If db is NULL, then db_version will default to a NULL value.

Computed values included in the data_info attribute are as follows:

num_edata The number of unique edata_cname entries.
num_na The number of NA observations in the dataset.
frac_na The prportion of e_data values that are NA.
num_zero The number of observations that equal 0 in the dataset.
frac_zero The proportion of e_data values that are 0.
num_taxa The number of unique taxa_cname entries.
num_ec The number of unique ec_cname entries.
num_gene The number of unique gene_cname entries.
num_samps The number of samples that make up the columns of e_data.
meta_info A logical argument, specifying whether e_meta is provided.

Author(s)

Allison Thompson, Lisa Bramer


pmartR/pmartRseq documentation built on May 25, 2019, 9:20 a.m.