numero.prepare: Prepare datasets for analysis

View source: R/numero.prepare.R

numero.prepareR Documentation

Prepare datasets for analysis

Description

Prepare training data by mitigating confounding factors and standardizing values.

Usage

numero.prepare(data, variables = NULL, confounders = NULL,
               batch = NULL, method = "standard", clip = 5.0,
	       pipeline = NULL, undo = FALSE)

Arguments

data

A matrix or a data frame.

variables

A character vector of column names, see details.

confounders

Names of columns that contain confounder data.

batch

The name of the column that contains batch labels.

method

Method to standardize values, see nroPreprocess().

clip

Range for clipping extreme values in multiples of standard deviations.

pipeline

Processing parameters from a previous use of the function.

undo

If true, standardization is reversed after adjusting for batches and confounders.

Details

We recommend first applying numero.clean() to the full dataset, then selecting a subset for training using the input argument variables. This preserves any attributes that may be used in Numero functions.

If a previous pipeline is available, it overrides all processing parameters irrespective of other input arguments.

Due to safeguards against numerical instability, the standardized values may deviate slightly from the expected range (<0.1 percent error is typical).

Clipping of extreme values is applied only during the first round of standardization before adjustments for confounders. Therefore, the final output may contain values that exceed the threshold.

Value

A matrix with the attributes 'pipeline' that contains the processing parameters and 'subsets' that contains row names divided into batches if batch correction was applied.

Examples

# Import data.
fname <- system.file("extdata", "finndiane.txt", package = "Numero")
dataset <- read.delim(file = fname)

# Set identities and manage missing data.
dataset <- numero.clean(dataset, identity = "INDEX")

# Prepare training variables using default standardization.
trvars <- c("CHOL", "HDL2C", "TG", "CREAT", "uALB")
trdata <- numero.prepare(data = dataset, variables = trvars)
print(summary(trdata))

# Prepare training values adjusted for age and sex and
# standardized by rank-based method.
trdata <- numero.prepare(data = dataset, variables = trvars,
                         batch = "MALE", confounders = "AGE",
			 method = "tapered")
print(summary(trdata))

Numero documentation built on Sept. 17, 2024, 5:09 p.m.