utilsCM: Utils to preprocess community matrix
In walterxie/ComMA: Community Matrix Analysis

Description Usage Arguments Details Examples

Utils to preprocess community matrix, such as removing OTUs by different filters, and aggregating matrix by different ways.

rmMinAbundance(community.matrix, minAbund = 2, MARGIN = 1, verbose = TRUE)

transposeDF(community.matrix, to.numeric = TRUE)

preprocessCM(cm, rm.samples = c(), min.abund = 5, mean.abund.thr = 0.025)

preprocessEnv(env, rm.samples = c(), log.var = c(), sel.env.var = c(),
  log.base = 2)

spilt.df(community.matrix, spilt.to = 2, MARGIN = 1, verbose = TRUE)

mostAbundantRows(community.matrix, most.abund = 150, row.decreasing = TRUE,
  col.decreasing = TRUE)

sumColumns(community.matrix, sep = "-", nth = 1)

`community.matrix`	Community matrix (OTU table), where rows are OTUs or individual species and columns are sites or samples. See `ComMA`.
`minAbund`	The minimum abundance threshold to remove rows/columns by row/column sum of abundance. For exampe, if minAbund=2, then remove all singletons appeared in only one sample. If minAbund=1, then remove all empty rows/columns. Default to 2 (singletons).
`MARGIN`	1 indicates rows, 2 indicates columns. Default to 1.
`verbose`	More details. Default to TRUE.
`cm`	A community matrix not transposed, Columns are samples.
`rm.samples`	Remove specified samples in a vector, it can be a keyword shared in sample names. The vector will convert to a string separated by '\|' to multi-samples. Default to empty vector to do nothing.
`min.abund, mean.abund.thr`	Exclude any samples with excessively low abundance. Defaul `min.abund=5, mean.abund.thr=0.025`. The final threshold takes the maximun value of `max(min.abund, mean(colSums(cm))*mean.abund.thr)`.
`env`	The enviornmental meta-data, where rows are samples and columns are enviornmental variables.
`log.var, log.base`	The vector of selected environmental variables to log. They are the same or a subset of indices or names used in `sel.env.var`. It normally needs log transform to soil chemistry variables. Use `plotCorrelations` to visualize variables and determine whether log transform should be applied. Default to no log transform.
`sel.env.var`	The vector of selected environmental variables to output, which can be colnames(env) or their indices. Defaul to an empty vector to choose all variables.
`spilt.to`	The number of sub-data-frame to spilt. It must >= 2. Default to 2.
`most.abund`	The threshold to define the number of the most abundent OTUs. Default to 150.
`row.decreasing, col.decreasing`	Should the sort decreasing order of `colnames` or `colSums` be TRUE? Refer to `order`. If NULL, do nothing. Default to TRUE.
`sep`	The seperator to get the nth substring from column names. Default to dash '-'.
`nth`	The nth substring. Default to 1 (first).

rmMinAbundance returns the subset matrix of given community matrix, by removing rows or colums whose sum of abundance is less than the minimum abundance threshold.

transposeDF returns a transposed data frame, such as transposed community matrix for vegan package.

preprocessCM exclude any samples with excessively low abundance.

preprocessEnv subsets the enviornmental variables and make log transform to soil chemistry variables.

spilt.df spilt a data frame into chunks of data frames having equal rows/columns.

mostAbundantRows takes the given number of most abundant rows (OTUs) from original community matrix to form a new matrix. The new matrix will sort by both rowSums and colSums in decreasing by default.

sumColumns sums the columns by the nth substring defined in column names.

# remove singletons 
ComMA::rmMinAbundance(community.matrix, minAbund=2)

t.community.matrix <- transposeDF(community.matrix)

cm <- preprocessCM(cm, rm.samples=c("CM30b51","CM30b58"))

env <- preprocessEnv(env, log.var=c(5:8,9:11), sel.env.var=c(4,5,8,9,14:22))

cm.list <- spiltCM(community.matrix)

community.matrix <- getCommunityMatrix("16S", isPlot=TRUE, minAbund=1)
OTU100 <- mostAbundantRows(community.matrix, most.abund=100)

# by subpl
community.matrix <- getCommunityMatrix("16S", isPlot=FALSE, minAbund=1)
colSums(community.matrix)
communityMatrix1 <- sumColumns(community.matrix)
colSums(communityMatrix1)