bp_VIP_analysis: Bootstrap and permutation over PLS-VIP

View source: R/nmr_data_analysis.R

bp_VIP_analysisR Documentation

Bootstrap and permutation over PLS-VIP

Description

Bootstrap and permutation over PLS-VIP on AlpsNMR can be performed on both nmr_dataset_1D full spectra as well as nmr_dataset_peak_table peak tables.

Usage

bp_VIP_analysis(dataset, train_index, y_column, ncomp, nbootstrap = 300)

Arguments

dataset

An nmr_dataset_family object

train_index

set of index used to generate the bootstrap datasets

y_column

A string with the name of the y column (present in the metadata of the dataset)

ncomp

number of components used in the plsda models

nbootstrap

number of bootstrap dataset

Details

Use of the bootstrap and permutation methods for a more robust variable importance in the projection metric for partial least squares regression

Value

A list with the following elements:

  • important_vips: A list with the important vips selected

  • relevant_vips: List of vips with some relevance

  • pls_vip: Pls-VIPs of every bootstrap

  • pls_vip_perm: Pls-VIPs of every bootstrap with permuted variables

  • pls_vip_means: Pls-VIPs normaliced differences means

  • pls_vip_score_diff: Differences of pls_vip and pls_vip_perm

  • pls_models: pls models of the diferent bootstraps

  • pls_perm_models: pls permuted models of the diferent bootstraps

  • classif_rate: classification rate of the bootstrap models

  • general_model: pls model trained with all train data

  • general_CR: classification rate of the general_model

  • vips_model: pls model trained with vips selection over all train data

  • vips_CR: classification rate of the vips_model

  • error: error spected in a t distribution

  • lower_bound: lower bound of the confidence interval

  • upper_bound: upper bound of the confidence interval

Examples

# Data analysis for a table of integrated peaks

## Generate an artificial nmr_dataset_peak_table:
### Generate artificial metadata:
num_samples <- 32 # use an even number in this example
num_peaks <- 20
metadata <- data.frame(
    NMRExperiment = as.character(1:num_samples),
    Condition = rep(c("A", "B"), times = num_samples / 2)
)

### The matrix with peaks
peak_means <- runif(n = num_peaks, min = 300, max = 600)
peak_sd <- runif(n = num_peaks, min = 30, max = 60)
peak_matrix <- mapply(function(mu, sd) rnorm(num_samples, mu, sd),
    mu = peak_means, sd = peak_sd
)
colnames(peak_matrix) <- paste0("Peak", 1:num_peaks)

## Artificial differences depending on the condition:
peak_matrix[metadata$Condition == "A", "Peak2"] <-
    peak_matrix[metadata$Condition == "A", "Peak2"] + 70

peak_matrix[metadata$Condition == "A", "Peak6"] <-
    peak_matrix[metadata$Condition == "A", "Peak6"] - 60

### The nmr_dataset_peak_table
peak_table <- new_nmr_dataset_peak_table(
    peak_table = peak_matrix,
    metadata = list(external = metadata)
)

## We will use a double cross validation, splitting the samples with random
## subsampling both in the external and internal validation.
## The classification model will be a PLSDA, exploring at maximum 3 latent
## variables.
## The best model will be selected based on the area under the ROC curve
methodology <- plsda_auroc_vip_method(ncomp = 3)
model <- nmr_data_analysis(
    peak_table,
    y_column = "Condition",
    identity_column = NULL,
    external_val = list(iterations = 1, test_size = 0.25),
    internal_val = list(iterations = 3, test_size = 0.25),
    data_analysis_method = methodology
)
## Area under ROC for each outer cross-validation iteration:
model$outer_cv_results_digested$auroc

## The number of components for the bootstrap models is selected
ncomps <- model$outer_cv_results$`1`$model$ncomp
train_index <- model$train_test_partitions$outer$`1`$outer_train

# Bootstrap and permutation for VIP selection
bp_VIPS <- bp_VIP_analysis(peak_table, # Data to be analyzed
    train_index,
    y_column = "Condition",
    ncomp = ncomps,
    nbootstrap = 10
)


sipss/AlpsNMR documentation built on Aug. 13, 2024, 5:11 p.m.