normalizeThis: Normalize data in various modes

View source: R/normalizeThis.R

normalizeThisR Documentation

Normalize data in various modes

Description

Generic normalization of 'dat' (by columns), multiple methods may be applied. The choice of normalization procedures must be done with care, plotting the data before and after normalization may be critical to understandig the initial data structure and the effect of the procedure applied. Inappropriate methods chosen may render interpretation of (further) results incorrect.

Usage

normalizeThis(
  dat,
  method = "mean",
  refLines = NULL,
  refGrp = NULL,
  mode = "proportional",
  trimFa = NULL,
  minQuant = NULL,
  sparseLim = 0.4,
  nCombin = 3,
  omitNonAlignable = FALSE,
  maxFact = 10,
  quantFa = NULL,
  expFa = NULL,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Arguments

dat

matrix or data.frame of data to get normalized

method

(character) may be "mean","median","NULL","none", "trimMean", "rowNormalize", "slope", "exponent", "slope2Sections", "vsn"; When NULL or 'none' is chosen the input will be returned

refLines

(NULL or numeric) allows to consider only specific lines of 'dat' when determining normalization factors (all data will be normalized)

refGrp

Only the columns indicated will be used as reference, default all columns (integer or colnames)

mode

(character) may be "proportional", "additive"; decide if normalizatio factors will be applies as multiplicative (proportional) or additive; for log2-omics data mode="aditive" is suggested

trimFa

(numeric, length=1) additional parameters for trimmed mean

minQuant

(numeric) only used with method='rowNormalize': optional filter to set all values below given value as NA; see also rowNormalize

sparseLim

(integer) only used with method='rowNormalize': decide at which min content of NA values the function should go in sparse-mode; see also rowNormalize

nCombin

(NULL or integer) only used with method='rowNormalize': used only in sparse-mode (ie if content of NAs higher than content of sparseLim): Number of groups of smller matrixes with this number of columns to be inspected initially; low values (small groups have higher chances of more common elements); see also rowNormalize

omitNonAlignable

(logical) only used with method='rowNormalize': allow omitting all columns which can't get aligned due to sparseness; see also rowNormalize

maxFact

(numeric, length=2) only used with method='rowNormalize': max normalization factor; see also rowNormalize

quantFa

(numeric, length=2) additional parameters for quantiles to use with method='slope'

expFa

(numeric, length=1) additional parameters for method='exponent'

silent

(logical) suppress messages

debug

(logical) additional messages for debugging

callFrom

(character) allows easier tracking of messages produced

Details

In most cases of treating 'Omics'-data one works with the hypothesis that there are no global changes in the structure of all data/columns Under this htpothesis it is very common to assume the the median (via the argument method) of all samples (ie columns) should remain constant. For examples samples/columns with less signal will be considered as having received 'accidentally' less material (eg due to the imprecision when transfering very small amounts of liquid samples). In consequence, a sample having received only 95 Thus, all measures will be multiplied by 1/0.95 (apr 1.053) to compensate for supposed lack of staring material.

With the analysis of 'Omics'-data it is very common to work with data on log-scale. In this case the argument mode should be set to additive, since adding a constant factor to log-data corresponds to a multiplicative factor on regular scale Please note that (at this point) the methods 'slope', 'exponent', 'slope2Sections' and 'vsn' don't distinguish between additive and proportional modes, but take take the data 'as is' (you may look at the original documenation for more details, see exponNormalize, adjBy2ptReg, justvsn).

Normalization using method="rowNormalize" runs rowNormalize from this package. In this case, the working hypothesis is, that all values in each row are expected to be the same. This method could be applied when all series of values (ie columns) are replicate measurements of the same sample. THere is also an option for treating sparse data (see argument sparseLim), which may, hovere, consume much more comptational ressources, in particular, when the value nCombin is low (compared to the number of samples/columns).

Normalization using method="vsn" runs justvsn from vsn (this requires a minimum of 42 rows of input-data and having the Bioconductor package vsn installed). Note : Depending on the procedure chosen, the normalized data may appear on a different scale.

Value

This function returns a matrix of normalized data (same dimensions as input)

See Also

rowNormalize, exponNormalize, adjBy2ptReg, justvsn

Examples

set.seed(2015); rand1 <- round(runif(300)+rnorm(300,0,2),3)
dat1 <- cbind(ser1=round(100:1+rand1[1:100]), ser2=round(1.2*(100:1+rand1[101:200])-2),
  ser3=round((100:1 +rand1[201:300])^1.2-3))
dat1 <- cbind(dat1, ser4=round(dat1[,1]^seq(2,5,length.out=100)+rand1[11:110],1))
dat1[dat1 <1] <- NA
  summary(dat1)
  dat1[c(1:5,50:54,95:100),]
no1 <- normalizeThis(dat1, refGrp=1:3, meth="mean")
no2 <- normalizeThis(dat1, refGrp=1:3, meth="trimMean", trim=0.4)
no3 <- normalizeThis(dat1, refGrp=1:3, meth="median")
no4 <- normalizeThis(dat1, refGrp=1:3, meth="slope", quantFa=c(0.2,0.8))
dat1[c(1:10,91:100),]
cor(dat1[,3],rowMeans(dat1[,1:2],na.rm=TRUE), use="complete.obs")             # high
cor(dat1[,4],rowMeans(dat1[,1:2],na.rm=TRUE), use="complete.obs")             # bad
cor(dat1[c(1:10,91:100),4],rowMeans(dat1[c(1:10,91:100),1:2],na.rm=TRUE),use="complete.obs")
cor(dat1[,3],rowMeans(dat1[,1:2],na.rm=TRUE)^ (1/seq(2,5,length.out=100)),use="complete.obs")

wrMisc documentation built on Nov. 17, 2023, 5:09 p.m.