normalizeThis: Normalize Data In Various Modes
In wrMisc: Analyze Experimental High-Throughput (Omics) Data

normalizeThis

R Documentation

Normalize Data In Various Modes

Description

Generic normalization of 'dat' (by columns), multiple methods may be applied. The choice of normalization procedures must be done with care, plotting the data before and after normalization may be critical to understandig the initial data structure and the effect of the procedure applied. Inappropriate methods chosen may render interpretation of (further) results incorrect.

Usage

normalizeThis(
  dat,
  method = c("mean", "average", "median", "trimMean", "rowNormalize", "slope",
    "twoPointSlope", "exponent", "vsn", "none", "NULL"),
  refLines = NULL,
  refGrp = NULL,
  mode = c("proportional", "additive", "linear", "logarithmic"),
  trimFa = NULL,
  minQuant = NULL,
  sparseLim = 0.4,
  nCombin = 3,
  omitNonAlignable = FALSE,
  maxFact = 10,
  quantFa = NULL,
  expFa = NULL,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Arguments

`dat`	matrix or data.frame of data to get normalized
`method`	(character) may be "mean","median","NULL","none", "trimMean", "rowNormalize", "slope", "exponent", "twoPointSlope", "vsn"; When `NULL` or 'none' is chosen the input will be returned
`refLines`	(NULL or numeric) allows to consider only specific lines of 'dat' when determining normalization factors (all data will be normalized)
`refGrp`	Only the columns indicated will be used as reference, default all columns (integer or colnames)
`mode`	(character) may be "proportional", "additive" (for log2 data); decide if normalization factors will be applied as multiplicative (proportional) or additive; for log2-omics data `mode="additive"` is suggested
`trimFa`	(numeric, length=1) additional parameters for trimmed mean
`minQuant`	(numeric) only used with `method='rowNormalize'`: optional filter to set all values below given value as `NA`; see also `rowNormalize`
`sparseLim`	(integer) only used with `method='rowNormalize'`: decide at which min content of `NA` values the function should go in sparse-mode; see also `rowNormalize`
`nCombin`	(NULL or integer) only used with `method='rowNormalize'`: used only in sparse-mode (ie if content of `NA`s higher than content of `sparseLim`): Number of groups of smller matrixes with this number of columns to be inspected initially; low values (small groups have higher chances of more common elements); see also `rowNormalize`
`omitNonAlignable`	(logical) only used with `method='rowNormalize'`: allow omitting all columns which can't get aligned due to sparseness; see also `rowNormalize`
`maxFact`	(numeric, length=2) only used with `method='rowNormalize'`: max normalization factor; see also `rowNormalize`
`quantFa`	(numeric, length=2) additional parameters for quantiles to use with method='slope'
`expFa`	(numeric, length=1) additional parameters for method='exponent'
`silent`	(logical) suppress messages
`debug`	(logical) additional messages for debugging
`callFrom`	(character) allows easier tracking of messages produced

Details

In most cases of treating 'Omics'-data one works with the hypothesis that there are no global changes in the structure of all data/columns Under this htpothesis it is very common to assume the the median (via the argument method) of all samples (ie columns) should remain constant. For examples samples/columns with less signal will be considered as having received 'accidentally' less material (eg due to the imprecision when transfering very small amounts of liquid samples). In consequence, a sample having received only 95 Thus, all measures will be multiplied by 1/0.95 (apr 1.053) to compensate for supposed lack of staring material.

With the analysis of 'Omics'-data it is very common to work with data on log-scale. In this case the argument mode should be set to additive, since adding a constant factor to log-data corresponds to a multiplicative factor on regular scale Please note that (at this point) the methods 'slope', 'exponent', 'twoPointSlope' and 'vsn' don't distinguish between additive and proportional modes, but take take the data 'as is' (you may look at the original documenation for more details, see exponNormalize, adjBy2ptReg, justvsn).

Normalization using method="rowNormalize" runs rowNormalize from this package. In this case, the working hypothesis is, that all values in each row are expected to be the same. This method could be applied when all series of values (ie columns) are replicate measurements of the same sample. THere is also an option for treating sparse data (see argument sparseLim), which may, hovere, consume much more comptational ressources, in particular, when the value nCombin is low (compared to the number of samples/columns).

Normalization using method="vsn" runs justvsn from vsn (this requires a minimum of 42 rows of input-data and having the Bioconductor package vsn installed). Note : Depending on the procedure chosen, the normalized data may appear on a different scale.

Value

This function returns a matrix of normalized data (same dimensions as input)

Examples

set.seed(2015); rand1 <- round(runif(300)+rnorm(300,0,2),3)
dat1 <- cbind(ser1=round(100:1+rand1[1:100]), ser2=round(1.2*(100:1+rand1[101:200])-2),
  ser3=round((100:1 +rand1[201:300])^1.2-3))
dat1 <- cbind(dat1, ser4=round(dat1[,1]^seq(2,5,length.out=100)+rand1[11:110],1))
dat1[dat1 <1] <- NA
  summary(dat1)
  dat1[c(1:5,50:54,95:100),]
no1 <- normalizeThis(dat1, refGrp=1:3, meth="mean")
no2 <- normalizeThis(dat1, refGrp=1:3, meth="trimMean", trim=0.4)
no3 <- normalizeThis(dat1, refGrp=1:3, meth="median")
no4 <- normalizeThis(dat1, refGrp=1:3, meth="slope", quantFa=c(0.2,0.8))
dat1[c(1:10,91:100),]
cor(dat1[,3],rowMeans(dat1[,1:2],na.rm=TRUE), use="complete.obs")             # high
cor(dat1[,4],rowMeans(dat1[,1:2],na.rm=TRUE), use="complete.obs")             # bad
cor(dat1[c(1:10,91:100),4],rowMeans(dat1[c(1:10,91:100),1:2],na.rm=TRUE),use="complete.obs")
cor(dat1[,3],rowMeans(dat1[,1:2],na.rm=TRUE)^ (1/seq(2,5,length.out=100)),use="complete.obs")

wrMisc documentation built on April 3, 2025, 8:17 p.m.