knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width=6, fig.height=4)
library(fmrireg)
library(fmridataset) # Dataset functionality refactored from fmrireg
library(neuroim2) # For creating example NeuroVec/NeuroVol

Introduction: Linking Data and Design

Effective fMRI analysis requires associating the measured brain activity (the imaging data) with crucial metadata, including:

The fmrireg package uses several dataset objects to encapsulate this information, providing a consistent input format for modeling functions like event_model, baseline_model, fmri_lm, and estimate_betas.

This vignette describes the main dataset classes and how to create them.

The sampling_frame

Before diving into datasets, recall the sampling_frame object (introduced in the Overview and detailed in other vignettes). It defines the fundamental temporal structure shared by all dataset types:

sframe_example <- sampling_frame(blocklens = c(150, 160), TR = 2.0)
print(sframe_example)

Dataset objects internally create or utilize a sampling_frame based on the provided run lengths and TR.

Overview of Dataset Classes

fmrireg offers different dataset classes depending on how your data is stored:

All these inherit from a base fmri_dataset class.

In-Memory Volumetric Data (fmri_mem_dataset)

Use this when your fMRI runs are loaded as neuroim2::NeuroVec objects in your R session.

Key Arguments:

# Create minimal example data (2 runs)
d <- c(5, 5, 5, 20) # Small dimensions for example
mask_vol <- neuroim2::LogicalNeuroVol(array(TRUE, d[1:3]), neuroim2::NeuroSpace(d[1:3]))

scan1 <- neuroim2::NeuroVec(array(rnorm(prod(d)), d), neuroim2::NeuroSpace(d))
scan2 <- neuroim2::NeuroVec(array(rnorm(prod(d)), d), neuroim2::NeuroSpace(d))

# Example event table
events_df <- data.frame(
  onset = c(5, 15, 5, 15), 
  condition = factor(c("A", "B", "A", "B")),
  run = c(1, 1, 2, 2)
)

# Create the dataset object
mem_dset <- fmri_mem_dataset(scans = list(scan1, scan2), 
                             mask = mask_vol, 
                             TR = 2.0, 
                             # run_length automatically inferred as c(20, 20)
                             event_table = events_df)

print(mem_dset)
# Access components
print(mem_dset$sampling_frame)

File-Based Volumetric Data (fmri_file_dataset)

This is often the most practical option for typical fMRI analyses where data resides in files.

Key Arguments:

# --- Create Dummy Files (for illustration only) ---
# In a real analysis, these files would already exist.
tmp_dir <- tempdir()
mask_filename <- "mask.nii.gz"
scan1_filename <- "run1.nii.gz"
scan2_filename <- "run2.nii.gz"

mask_file_full_path <- file.path(tmp_dir, mask_filename)
scan1_file_full_path <- file.path(tmp_dir, scan1_filename)
scan2_file_full_path <- file.path(tmp_dir, scan2_filename)

# Create small dummy mask and scans using neuroim2 functionality
d <- c(5, 5, 5) # Mask dimensions
d_run1 <- c(d, 20) # Run 1 dimensions (time=20)
d_run2 <- c(d, 25) # Run 2 dimensions (time=25)

mask_vol_dummy <- neuroim2::NeuroVol(array(1, d), neuroim2::NeuroSpace(d))
scan1_dummy <- neuroim2::NeuroVec(array(rnorm(prod(d_run1)), d_run1), neuroim2::NeuroSpace(d_run1))
scan2_dummy <- neuroim2::NeuroVec(array(rnorm(prod(d_run2)), d_run2), neuroim2::NeuroSpace(d_run2))

# Ensure dummy files are written using their full paths
neuroim2::write_vol(mask_vol_dummy, mask_file_full_path)
neuroim2::write_vec(scan1_dummy, scan1_file_full_path)
neuroim2::write_vec(scan2_dummy, scan2_file_full_path)
# --- End Dummy File Creation ---

# Create the file-based dataset object
# Pass only filenames to 'scans' and 'mask', and specify the directory in 'base_path'
file_dset <- fmri_dataset(scans = c(scan1_filename, scan2_filename), 
                            mask = mask_filename, 
                            TR = 1.5, 
                            run_length = c(20, 25), # Must match time dim of files
                            event_table = events_df, 
                            base_path = tmp_dir,    # Set base_path to the temp directory
                            preload = FALSE) # Keep data on disk

# This print statement should now work
print(file_dset)

# Clean up dummy files (optional, commented out for vignette)
# file.remove(mask_file_full_path, scan1_file_full_path, scan2_file_full_path)

Using preload=FALSE is memory-efficient as only the required data segments are read when needed (e.g., during model fitting).

Matrix Data (matrix_dataset)

Use this if your fMRI data is already represented as a 2D matrix where rows are time points and columns are voxels or components (e.g., after surface projection or ROI averaging).

Key Arguments:

# Example matrix (100 time points, 50 features/voxels)
# Two runs of 50 time points each
time_points <- 100
features <- 50
run_len <- c(50, 50)
example_matrix <- matrix(rnorm(time_points * features), time_points, features)

# Example event table for matrix data
events_mat_df <- data.frame(
  onset = c(seq(5, 45, by=10), seq(5, 45, by=10)), 
  condition = factor(rep(c("C", "D"), 10)),
  run = rep(1:2, each = 5)
)

mat_dset <- matrix_dataset(datamat = example_matrix, 
                           TR = 2.5, 
                           run_length = run_len,
                           event_table = events_mat_df)

print(mat_dset)
print(mat_dset$sampling_frame)

For matrix_dataset, the concept of a spatial mask is implicit; all columns provided in datamat are included.

Latent Data (latent_dataset)

This class is designed for data that has undergone dimensionality reduction (e.g., PCA, ICA). It wraps a LatentNeuroVec object, which stores the basis vectors (latent components over time) and loadings (spatial maps of components). Creating and using LatentNeuroVec objects typically requires the fmristore package.

Key Arguments:

# Conceptual example (requires fmristore package and a LatentNeuroVec)
# Assuming 'my_latent_neuro_vec' is a LatentNeuroVec object representing
# 20 components over 300 time points (2 runs of 150)

# latent_dset <- latent_dataset(lvec = my_latent_neuro_vec, 
#                              TR = 2.0, 
#                              run_length = c(150, 150),
#                              event_table = some_event_df)
# 
# print(latent_dset)

This dataset type essentially behaves like a matrix_dataset where the matrix columns are the latent component time series.

Using Dataset Objects

Once created, these dataset objects serve as the primary data input for fmrireg's modeling functions:

They provide a standardized way to access data (get_data(dset)), masks (get_mask(dset)), and timing information (blocklens(dset), blockids(dset)), regardless of the underlying storage format.

Choosing the appropriate dataset class depends on where your data resides (memory, files) and its format (volumetric, matrix, latent).



bbuchsbaum/fmrireg documentation built on June 10, 2025, 8:18 p.m.