normalizeData: Normalize data to be used by GFA
In GFA: Group Factor Analysis

View source: R/normalizeData.R

normalizeData

R Documentation

Normalize data to be used by GFA

Description

normalizeData is used to transform a data collection into a normalized form suitable for GFA. This function does two things: 1. It centers each variable. GFA assumes zero-mean data, as it models variances. 2. It normalizes the scales of variables and/or variable groups. Features with higher variance will affect the model structure more; if this is not desired, the normalization should be done. In GFA it is additionally possible to normalize the importance of variable groups (data sources), in addition or instead of individual variables. Finally, the total variance of data is normalized for numerical reasons. This is particularly important if no other normalization is done. NOTE: the function assumes continuous-valued data. If some features are e.g. binary with only a small portion of 1s, we do not recommend centering them.

Usage

normalizeData(train, test = NULL, type = "scaleOverAll")

Arguments

`train`	a training data set. For a detailed description, see parameter Y in `gfa`.
`test`	a test dataset. Should be provided if sequential prediction is used later.
`type`	Specifies the type of normalization to do. Mean-centering of the features is performed in all the cases, and option "center" does not perform any scaling. Option "scaleOverall" (default) uses a single parameter to scale the variance of the whole data collection to 1, while "scaleSources" scales each data source to have variance 1. Finally, "scaleFeatures" performs z-normalization, i.e. assigns the variance of each feature to 1.

Value

A list containing the following elements:

`train`	Normalized training data.
`test`	Normalized test data for sequential prediction (if provided as input).
`trainMean`	Feature-wise means of the training data sources.
`trainSd`	Feature-wise/overall standard deviations of the training data sources.