centerGeneData_v1: Center gene data

centerGeneData_v1R Documentation

Center gene data

Description

Performs row centering on a matrix of data in log space

Usage

centerGeneData_v1(
  x,
  floor = NA,
  controlSamples = NA,
  centerGroups = NULL,
  needsLog = NULL,
  mean = FALSE,
  returnGroupedValues = FALSE,
  returnValues = TRUE,
  groupPrefix = "group",
  showGroups = FALSE,
  scale = c("none", "row"),
  verbose = FALSE,
  ...
)

Arguments

x

numeric matrix typically containing measurements (genes) as rows, and samples as columns.

floor

optional numeric floor, below which values are set to the floor value, useful when one wants to avoid centering to values which may be below a noise threshold which might otherwise result in artificially inflated log fold changes.

controlSamples

optional character vector of colnames(x) to be used as controls when centering data. In the event that centerGroups is also defined, the controlSamples are only used within each group of colnames. When this value is NULL or NA, or when any group of colnames defined by centerGroups contains no controlSamples, then all samples are used for centering. This relationship is clearly described in the attribute named "centerGroups".

centerGroups

optional character vector named by colnames, whose values are group names. Alternatively, a list of vectors of colnames, where each list element contains colnames in explicit groups.

mean

logical indicating whether to use row means, or row medians. If the matrixStats package is available, it uses matrixStats::rowMedians() for calculations, otherwise falling back to apply(x, 1, median) which is notably slower for large data matrices.

returnGroupedValues

logical indicating whether to append columns which contain the control mean or median values used during centering.

showGroups

logical indicating whether to print the sample centring relationship to screen during processing. Note this information is also contained in attribute "centerGroups".

scale

character values indicating whether to scale data by row, or perform no row scaling. Scaling is dependent upon whether median or mean values are used in centering. If mean values are used, scaling is accomplished by dividing row values by the standard deviation. If median is used, then scaling divides row values by the MAD which is derived using the median instead of the mean.

needsLod

logical, indicating whether to perform log2 transformation of data prior to centering. If NULL, then if any value is above 40, it sets needsLog=TRUE and uses log2(x) for centering.

Details

This function is deprecated in favor of centerGeneData() which includes more flexibility in data centering. It is maintained here for backward compatibility.

This function is a relatively simple wrapper function which subtracts the row median (or row mean when mean=FALSE) from each row. The function allows defining a subset of columns to be used in determining the row control value via controlSamples, Similarly, columns can be grouped, where columns are centered versus their relevant control samples within each group of columns.

Value

numeric matrix with the same row and column dimensions as input data. If returnGroupedValues=TRUE, the additional columns contain the row median or mean values, dependent upon mean=FALSE or mean=TRUE, respectively. An attribute centerGroups is included, which describes the specific relationship between each colname, and associated control sample colnames, and optional centerGroups grouping of colnames. When columns are grouped and centered to specific control samples, is it important to keep this information during downstream scrutiny of results.

Examples

x <- matrix(1:100, ncol=10);
colnames(x) <- letters[1:10];
# basic centering
centerGeneData_v1(x);

# grouped centering
centerGeneData_v1(x,
   centerGroups=rep(c("A","B"), c(5,5)));

# centering versus specific control columns
centerGeneData_v1(x,
   controlSamples=letters[c(1:3)]);

# grouped centering versus specific control columns
centerGeneData_v1(x,
   centerGroups=rep(c("A","B"), c(5,5)),
   controlSamples=letters[c(1:3, 6:8)]);

# confirm the centerGroups and controlSamples
x_ctr <- centerGeneData_v1(x,
   centerGroups=rep(c("A","B"), c(5,5)),
   controlSamples=letters[c(1:3, 6:8)],
   showGroups=TRUE);

attr(x_ctr, "centerDF");


jmw86069/jamma documentation built on July 6, 2023, 1:09 p.m.