utilsCM: Utils to preprocess community matrix

Description Usage Arguments Details Examples

Description

Utils to preprocess community matrix, such as removing OTUs by different filters, and aggregating matrix by different ways.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
rmMinAbundance(community.matrix, minAbund = 2, MARGIN = 1, verbose = TRUE)

transposeDF(community.matrix, to.numeric = TRUE)

preprocessCM(cm, rm.samples = c(), min.abund = 5, mean.abund.thr = 0.025)

preprocessEnv(env, rm.samples = c(), log.var = c(), sel.env.var = c(),
  log.base = 2)

spilt.df(community.matrix, spilt.to = 2, MARGIN = 1, verbose = TRUE)

mostAbundantRows(community.matrix, most.abund = 150, row.decreasing = TRUE,
  col.decreasing = TRUE)

sumColumns(community.matrix, sep = "-", nth = 1)

Arguments

community.matrix

Community matrix (OTU table), where rows are OTUs or individual species and columns are sites or samples. See ComMA.

minAbund

The minimum abundance threshold to remove rows/columns by row/column sum of abundance. For exampe, if minAbund=2, then remove all singletons appeared in only one sample. If minAbund=1, then remove all empty rows/columns. Default to 2 (singletons).

MARGIN

1 indicates rows, 2 indicates columns. Default to 1.

verbose

More details. Default to TRUE.

cm

A community matrix not transposed, Columns are samples.

rm.samples

Remove specified samples in a vector, it can be a keyword shared in sample names. The vector will convert to a string separated by '|' to multi-samples. Default to empty vector to do nothing.

min.abund, mean.abund.thr

Exclude any samples with excessively low abundance. Defaul min.abund=5, mean.abund.thr=0.025. The final threshold takes the maximun value of max(min.abund, mean(colSums(cm))*mean.abund.thr).

env

The enviornmental meta-data, where rows are samples and columns are enviornmental variables.

log.var, log.base

The vector of selected environmental variables to log. They are the same or a subset of indices or names used in sel.env.var. It normally needs log transform to soil chemistry variables. Use plotCorrelations to visualize variables and determine whether log transform should be applied. Default to no log transform.

sel.env.var

The vector of selected environmental variables to output, which can be colnames(env) or their indices. Defaul to an empty vector to choose all variables.

spilt.to

The number of sub-data-frame to spilt. It must >= 2. Default to 2.

most.abund

The threshold to define the number of the most abundent OTUs. Default to 150.

row.decreasing, col.decreasing

Should the sort decreasing order of colnames or colSums be TRUE? Refer to order. If NULL, do nothing. Default to TRUE.

sep

The seperator to get the nth substring from column names. Default to dash '-'.

nth

The nth substring. Default to 1 (first).

Details

rmMinAbundance returns the subset matrix of given community matrix, by removing rows or colums whose sum of abundance is less than the minimum abundance threshold.

transposeDF returns a transposed data frame, such as transposed community matrix for vegan package.

preprocessCM exclude any samples with excessively low abundance.

preprocessEnv subsets the enviornmental variables and make log transform to soil chemistry variables.

spilt.df spilt a data frame into chunks of data frames having equal rows/columns.

mostAbundantRows takes the given number of most abundant rows (OTUs) from original community matrix to form a new matrix. The new matrix will sort by both rowSums and colSums in decreasing by default.

sumColumns sums the columns by the nth substring defined in column names.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
# remove singletons 
ComMA::rmMinAbundance(community.matrix, minAbund=2)

t.community.matrix <- transposeDF(community.matrix)

cm <- preprocessCM(cm, rm.samples=c("CM30b51","CM30b58"))

env <- preprocessEnv(env, log.var=c(5:8,9:11), sel.env.var=c(4,5,8,9,14:22))

cm.list <- spiltCM(community.matrix)

community.matrix <- getCommunityMatrix("16S", isPlot=TRUE, minAbund=1)
OTU100 <- mostAbundantRows(community.matrix, most.abund=100)

# by subpl
community.matrix <- getCommunityMatrix("16S", isPlot=FALSE, minAbund=1)
colSums(community.matrix)
communityMatrix1 <- sumColumns(community.matrix)
colSums(communityMatrix1)

walterxie/ComMA documentation built on May 3, 2019, 11:51 p.m.