suppressPackageStartupMessages({ library(neuroim2) library(rMVPA) library(dplyr) })
Cross-validation is a critical component in evaluating the performance and generalizability of MVPA models. The rMVPA
package provides several cross-validation strategies specifically designed for neuroimaging data, where temporal structure and run/block organization must be carefully considered.
This vignette covers: - The importance of cross-validation in MVPA - Available cross-validation schemes - How to implement each scheme - Best practices and considerations - Working examples
Cross-validation is essential in neuroimaging analyses. It gives us unbiased estimates of how well our models perform and shows whether brain activity patterns truly generalize to new data. By separating training and test sets, cross-validation prevents us from overfitting to noise in the data. It also maintains temporal independence between sets, which is crucial for time-series neuroimaging data.
The rMVPA
package implements several cross-validation strategies:
Blocked cross-validation is the most common approach for fMRI data. It respects the temporal structure of the data by using scanning runs as natural validation blocks.
# Create a simple blocked structure: 5 runs with 20 trials each block_var <- rep(1:5, each = 20) cval <- blocked_cross_validation(block_var) print(cval)
# Generate example data set.seed(123) dat <- data.frame( x1 = rnorm(100), # 100 trials total x2 = rnorm(100), x3 = rnorm(100) ) y <- factor(rep(letters[1:5], length.out = 100)) # 5 conditions # Generate cross-validation samples samples <- crossval_samples(cval, dat, y) print(samples)
Each row in the samples tibble contains:
- ytrain
: Training labels
- ytest
: Test labels
- train
: Training data subset
- test
: Test data subset
- .id
: Fold identifier
This method combines blocking with bootstrap resampling, providing more stable performance estimates while respecting the run structure.
# Create bootstrap blocked CV with 20 repetitions boot_cval <- bootstrap_blocked_cross_validation(block_var, nreps = 20) print(boot_cval) # Generate samples boot_samples <- crossval_samples(boot_cval, dat, y) print(boot_samples)
You can provide weights to influence the sampling probability:
# Create weights (e.g., based on motion parameters) weights <- runif(length(block_var)) weighted_boot_cval <- bootstrap_blocked_cross_validation( block_var, nreps = 20, weights = weights ) print(weighted_boot_cval)
This method creates sequential folds within each block, useful when temporal order matters.
# Create sequential blocked CV with 2 folds and 4 repetitions seq_cval <- sequential_blocked_cross_validation( block_var, nfolds = 2, nreps = 4 ) print(seq_cval)
This approach randomly splits blocks into two groups, useful for rapid performance estimation.
# Create two-fold blocked CV with 10 repetitions twofold_cval <- twofold_blocked_cross_validation(block_var, nreps = 10) print(twofold_cval)
For specialized validation schemes, you can define custom training/testing splits:
# Define custom splits custom_splits <- list( list(train = 1:60, test = 61:100), list(train = 1:40, test = 41:100), list(train = 1:80, test = 81:100) ) # Create custom CV custom_cval <- custom_cross_validation(custom_splits) print(custom_cval)
Here's a complete example using blocked cross-validation with an SDA classifier:
# Setup cross-validation block_var <- rep(1:5, each = 20) cval <- blocked_cross_validation(block_var) # Generate data set.seed(123) dat <- data.frame(matrix(rnorm(100 * 10), 100, 10)) y <- factor(rep(letters[1:5], 20)) # Generate CV samples samples <- crossval_samples(cval, dat, y) # Train models for each fold model_fits <- samples %>% rowwise() %>% do({ train_dat <- as.data.frame(.$train) y_train <- .$ytrain fit <- sda::sda(as.matrix(train_dat), y_train, verbose = FALSE) tibble::tibble(fit = list(fit)) }) print(model_fits)
When choosing a cross-validation strategy, consider:
Use bootstrap blocked CV for more stable estimates
Sample Size
For large datasets, simple blocked CV may suffice
Temporal Dependencies
Consider temporal autocorrelation
Computation Time
Choose the right cross-validation strategy for your neuroimaging data. Match the CV approach to your data structure, account for temporal dependencies in fMRI, and use bootstrap methods when you need more stable estimates. The rMVPA package supports all these approaches, including custom schemes for specialized needs.
For implementation details, refer to the source code in crossval.R
.
Cross-validation strategies in rMVPA
are designed to work seamlessly with both regional and searchlight analyses. Here we'll demonstrate how to incorporate different cross-validation schemes into these analyses.
Let's perform a regional MVPA analysis using different cross-validation strategies:
# Generate a sample dataset data_out <- gen_sample_dataset(D = c(6,6,6), nobs = 80, blocks = 4, nlevels = 2) # Create a region mask mask <- data_out$dataset$mask nvox <- sum(mask) region_mask <- neuroim2::NeuroVol( sample(1:3, size = nvox, replace = TRUE), space(mask), indices = which(mask > 0) ) # Create MVPA dataset dset <- mvpa_dataset(data_out$dataset$train_data, mask = data_out$dataset$mask) # Load the classification model mod <- load_model("sda_notune") tune_grid <- data.frame(lambda = 0.01, diagonal = FALSE) # Example 1: Using Blocked Cross-Validation blocked_cv <- blocked_cross_validation(data_out$design$block_var) mvpa_mod_blocked <- mvpa_model( mod, dataset = dset, design = data_out$design, crossval = blocked_cv, tune_grid = tune_grid ) # Run regional analysis with blocked CV results_blocked <- run_regional(mvpa_mod_blocked, region_mask) # Example 2: Using Bootstrap Blocked Cross-Validation bootstrap_cv <- bootstrap_blocked_cross_validation( data_out$design$block_var, nreps = 10 ) mvpa_mod_boot <- mvpa_model( mod, dataset = dset, design = data_out$design, crossval = bootstrap_cv, tune_grid = tune_grid ) # Run regional analysis with bootstrap CV results_boot <- run_regional(mvpa_mod_boot, region_mask) # Compare performance between CV strategies cat("Blocked CV Performance:\n") print(results_blocked$performance_table) cat("\nBootstrap CV Performance:\n") print(results_boot$performance_table)
We can also use different cross-validation strategies in searchlight analysis:
# Example 3: Searchlight with Sequential Blocked Cross-Validation seq_cv <- sequential_blocked_cross_validation( data_out$design$block_var, nfolds = 2, nreps = 4 ) mvpa_mod_seq <- mvpa_model( mod, dataset = dset, design = data_out$design, crossval = seq_cv, tune_grid = tune_grid ) # Run searchlight analysis results_searchlight <- run_searchlight( mvpa_mod_seq, radius = 2, method = "standard" )
When integrating cross-validation with regional or searchlight analyses:
nreps
for large searchlight analysesMonitor memory usage with large datasets
Computation Time
Use parallel processing when available
Result Stability
For searchlight analyses, simpler CV schemes might be sufficient
Performance Metrics
Here's a more detailed comparison of different CV strategies:
# Create different CV schemes cv_schemes <- list( blocked = blocked_cross_validation(data_out$design$block_var), bootstrap = bootstrap_blocked_cross_validation(data_out$design$block_var, nreps = 10), twofold = twofold_blocked_cross_validation(data_out$design$block_var, nreps = 5) ) # Run regional analysis with each CV scheme results <- lapply(cv_schemes, function(cv) { mvpa_mod <- mvpa_model( mod, dataset = dset, design = data_out$design, crossval = cv, tune_grid = tune_grid ) run_regional(mvpa_mod, region_mask) }) # Compare performance across CV schemes performance_comparison <- lapply(names(results), function(name) { perf <- results[[name]]$performance_table perf$cv_scheme <- name perf }) # Combine results all_performance <- do.call(rbind, performance_comparison) print(all_performance)
This integration demonstrates how different cross-validation strategies can be easily incorporated into the broader MVPA analysis framework, allowing researchers to choose the most appropriate validation approach for their specific analysis needs.
For more details on regional and searchlight analyses, refer to their respective vignettes and the implementation in regional.R
and searchlight.R
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.