View source: R/numero.prepare.R
numero.prepare | R Documentation |
Prepare training data by mitigating confounding factors and standardizing values.
numero.prepare(data, variables = NULL, confounders = NULL,
batch = NULL, method = "standard", clip = 5.0,
pipeline = NULL, undo = FALSE)
data |
A matrix or a data frame. |
variables |
A character vector of column names, see details. |
confounders |
Names of columns that contain confounder data. |
batch |
The name of the column that contains batch labels. |
method |
Method to standardize values, see |
clip |
Range for clipping extreme values in multiples of standard deviations. |
pipeline |
Processing parameters from a previous use of the function. |
undo |
If true, standardization is reversed after adjusting for batches and confounders. |
We recommend first applying numero.clean()
to the full
dataset, then selecting a subset for training using the input argument
variables
. This preserves any attributes that may be used in
Numero functions.
If a previous pipeline
is available, it overrides all processing
parameters irrespective of other input arguments.
Due to safeguards against numerical instability, the standardized values may deviate slightly from the expected range (<0.1 percent error is typical).
Clipping of extreme values is applied only during the first round of standardization before adjustments for confounders. Therefore, the final output may contain values that exceed the threshold.
A matrix with the attributes 'pipeline' that contains the processing parameters and 'subsets' that contains row names divided into batches if batch correction was applied.
# Import data.
fname <- system.file("extdata", "finndiane.txt", package = "Numero")
dataset <- read.delim(file = fname)
# Set identities and manage missing data.
dataset <- numero.clean(dataset, identity = "INDEX")
# Prepare training variables using default standardization.
trvars <- c("CHOL", "HDL2C", "TG", "CREAT", "uALB")
trdata <- numero.prepare(data = dataset, variables = trvars)
print(summary(trdata))
# Prepare training values adjusted for age and sex and
# standardized by rank-based method.
trdata <- numero.prepare(data = dataset, variables = trvars,
batch = "MALE", confounders = "AGE",
method = "tapered")
print(summary(trdata))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.