impute: Missing data imputation

imputeAllR Documentation

Missing data imputation

Description

Impute missing values using random forest imputation.

Usage

imputeAll(d, occupancy = 2/3, parallel = "variables", seed = 1234)

## S4 method for signature 'AnalysisData'
imputeAll(d, occupancy = 2/3, parallel = "variables", seed = 1234)

imputeClass(d, cls = "class", occupancy = 2/3, seed = 1234)

## S4 method for signature 'AnalysisData'
imputeClass(d, cls = "class", occupancy = 2/3, seed = 1234)

Arguments

d

S4 object of class AnalysisData

occupancy

occupancy threshold above which missing values of a feature will be imputed

parallel

parallel type to use. See ?missForest for details

seed

random number seed

cls

info column to use for class labels

Details

Missing values can have an important influence on downstream analyses with zero values heavily influencing the outcomes of parametric tests. Where and how they are imputed are important considerations and is highly related to variable occupancy. The methods provided here allow both these aspects to be taken into account and utilise random forest imputation using the missForest package.

Value

An S4 object of class AnalysisData containing the data after imputation.

Methods

  • imputeAll: Impute missing values across all sample features.

  • imputeClass: Impute missing values class-wise.

Examples

## Each of the following examples shows the application of each imputation method and then 
## a Linear Discriminant Analysis is plotted to show it's effect on the data structure.

## Initial example data preparation
library(metaboData)

d <- analysisData(abr1$neg[,200:250],abr1$fact) %>% 
 occupancyMaximum(occupancy = 2/3)

d %>% 
 plotLDA(cls = 'day')
 
## Missing value imputation across all samples
d %>% 
 imputeAll(parallel = 'no') %>% 
 plotLDA(cls = 'day')

## Missing value imputation class-wise
d %>% 
 imputeClass(cls = 'day') %>% 
 plotLDA(cls = 'day')

jasenfinch/metabolyseR documentation built on Sept. 18, 2023, 1:25 a.m.