knitr::opts_chunk$set(collapse = TRUE, comment = "#>", fig.width=6, fig.height=4) library(fmrireg) library(fmridataset) # Dataset functionality refactored from fmrireg library(neuroim2) # For creating example NeuroVec/NeuroVol
Effective fMRI analysis requires associating the measured brain activity (the imaging data) with crucial metadata, including:
The fmrireg
package uses several dataset objects to encapsulate this information, providing a consistent input format for modeling functions like event_model
, baseline_model
, fmri_lm
, and estimate_betas
.
This vignette describes the main dataset classes and how to create them.
sampling_frame
Before diving into datasets, recall the sampling_frame
object (introduced in the Overview and detailed in other vignettes). It defines the fundamental temporal structure shared by all dataset types:
blocklens
: A vector specifying the number of scans (time points) in each run.TR
: The repetition time (time between scans) in seconds.sframe_example <- sampling_frame(blocklens = c(150, 160), TR = 2.0) print(sframe_example)
Dataset objects internally create or utilize a sampling_frame
based on the provided run lengths and TR.
fmrireg
offers different dataset classes depending on how your data is stored:
fmri_mem_dataset
: For volumetric fMRI data already loaded into R memory (as NeuroVec
objects).fmri_file_dataset
: For volumetric fMRI data stored in image files (e.g., NIfTI) on disk.matrix_dataset
: For fMRI data represented as a standard R matrix (time points x voxels/components).latent_dataset
: For dimension-reduced data (e.g., PCA/ICA components), typically requiring the fmristore
package.All these inherit from a base fmri_dataset
class.
fmri_mem_dataset
)Use this when your fMRI runs are loaded as neuroim2::NeuroVec
objects in your R session.
Key Arguments:
scans
: A list of NeuroVec
objects, one for each run.mask
: A neuroim2::NeuroVol
or neuroim2::LogicalNeuroVol
object representing the brain mask.TR
: Repetition time (seconds).run_length
(Optional): Vector of run lengths; if omitted, inferred from the dimensions of the NeuroVec
objects in scans
.event_table
(Optional): A data.frame
containing experimental design information.# Create minimal example data (2 runs) d <- c(5, 5, 5, 20) # Small dimensions for example mask_vol <- neuroim2::LogicalNeuroVol(array(TRUE, d[1:3]), neuroim2::NeuroSpace(d[1:3])) scan1 <- neuroim2::NeuroVec(array(rnorm(prod(d)), d), neuroim2::NeuroSpace(d)) scan2 <- neuroim2::NeuroVec(array(rnorm(prod(d)), d), neuroim2::NeuroSpace(d)) # Example event table events_df <- data.frame( onset = c(5, 15, 5, 15), condition = factor(c("A", "B", "A", "B")), run = c(1, 1, 2, 2) ) # Create the dataset object mem_dset <- fmri_mem_dataset(scans = list(scan1, scan2), mask = mask_vol, TR = 2.0, # run_length automatically inferred as c(20, 20) event_table = events_df) print(mem_dset) # Access components print(mem_dset$sampling_frame)
fmri_file_dataset
)This is often the most practical option for typical fMRI analyses where data resides in files.
Key Arguments:
scans
: A character vector of file paths to the 4D fMRI image files (e.g., .nii.gz
), one path per run.mask
: A character string giving the file path to the 3D mask image file.TR
: Repetition time (seconds).run_length
: A numeric vector specifying the number of volumes (time points) in each run file listed in scans
.event_table
(Optional): A data.frame
with experimental design info.base_path
(Optional): A path to prepend to relative file paths in scans
and mask
.preload
(Optional, Default: FALSE
): If TRUE
, load the mask and scan data into memory immediately. If FALSE
(recommended for large data), data is read only when accessed.mode
(Optional): Storage mode for neuroim2
when reading data (e.g., "normal", "mmap").# --- Create Dummy Files (for illustration only) --- # In a real analysis, these files would already exist. tmp_dir <- tempdir() mask_filename <- "mask.nii.gz" scan1_filename <- "run1.nii.gz" scan2_filename <- "run2.nii.gz" mask_file_full_path <- file.path(tmp_dir, mask_filename) scan1_file_full_path <- file.path(tmp_dir, scan1_filename) scan2_file_full_path <- file.path(tmp_dir, scan2_filename) # Create small dummy mask and scans using neuroim2 functionality d <- c(5, 5, 5) # Mask dimensions d_run1 <- c(d, 20) # Run 1 dimensions (time=20) d_run2 <- c(d, 25) # Run 2 dimensions (time=25) mask_vol_dummy <- neuroim2::NeuroVol(array(1, d), neuroim2::NeuroSpace(d)) scan1_dummy <- neuroim2::NeuroVec(array(rnorm(prod(d_run1)), d_run1), neuroim2::NeuroSpace(d_run1)) scan2_dummy <- neuroim2::NeuroVec(array(rnorm(prod(d_run2)), d_run2), neuroim2::NeuroSpace(d_run2)) # Ensure dummy files are written using their full paths neuroim2::write_vol(mask_vol_dummy, mask_file_full_path) neuroim2::write_vec(scan1_dummy, scan1_file_full_path) neuroim2::write_vec(scan2_dummy, scan2_file_full_path) # --- End Dummy File Creation --- # Create the file-based dataset object # Pass only filenames to 'scans' and 'mask', and specify the directory in 'base_path' file_dset <- fmri_dataset(scans = c(scan1_filename, scan2_filename), mask = mask_filename, TR = 1.5, run_length = c(20, 25), # Must match time dim of files event_table = events_df, base_path = tmp_dir, # Set base_path to the temp directory preload = FALSE) # Keep data on disk # This print statement should now work print(file_dset) # Clean up dummy files (optional, commented out for vignette) # file.remove(mask_file_full_path, scan1_file_full_path, scan2_file_full_path)
Using preload=FALSE
is memory-efficient as only the required data segments are read when needed (e.g., during model fitting).
matrix_dataset
)Use this if your fMRI data is already represented as a 2D matrix where rows are time points and columns are voxels or components (e.g., after surface projection or ROI averaging).
Key Arguments:
datamat
: The numeric matrix (time x features).TR
: Repetition time (seconds).run_length
: Vector specifying the number of rows (time points) belonging to each run.event_table
(Optional): A data.frame
with design info (must have same total number of rows as datamat
).# Example matrix (100 time points, 50 features/voxels) # Two runs of 50 time points each time_points <- 100 features <- 50 run_len <- c(50, 50) example_matrix <- matrix(rnorm(time_points * features), time_points, features) # Example event table for matrix data events_mat_df <- data.frame( onset = c(seq(5, 45, by=10), seq(5, 45, by=10)), condition = factor(rep(c("C", "D"), 10)), run = rep(1:2, each = 5) ) mat_dset <- matrix_dataset(datamat = example_matrix, TR = 2.5, run_length = run_len, event_table = events_mat_df) print(mat_dset) print(mat_dset$sampling_frame)
For matrix_dataset
, the concept of a spatial mask is implicit; all columns provided in datamat
are included.
latent_dataset
)This class is designed for data that has undergone dimensionality reduction (e.g., PCA, ICA). It wraps a LatentNeuroVec
object, which stores the basis vectors (latent components over time) and loadings (spatial maps of components). Creating and using LatentNeuroVec
objects typically requires the fmristore
package.
Key Arguments:
lvec
: A LatentNeuroVec
object from the fmristore
package.TR
: Repetition time (seconds).run_length
: Vector specifying run lengths (must sum to the time dimension of lvec
).event_table
(Optional): Experimental design data.frame
.# Conceptual example (requires fmristore package and a LatentNeuroVec) # Assuming 'my_latent_neuro_vec' is a LatentNeuroVec object representing # 20 components over 300 time points (2 runs of 150) # latent_dset <- latent_dataset(lvec = my_latent_neuro_vec, # TR = 2.0, # run_length = c(150, 150), # event_table = some_event_df) # # print(latent_dset)
This dataset type essentially behaves like a matrix_dataset
where the matrix columns are the latent component time series.
Once created, these dataset objects serve as the primary data input for fmrireg
's modeling functions:
event_model(..., sampling_frame = dset$sampling_frame)
baseline_model(..., sframe = dset$sampling_frame)
fmri_lm(model, dataset = dset)
estimate_betas(..., dataset = dset)
They provide a standardized way to access data (get_data(dset)
), masks (get_mask(dset)
), and timing information (blocklens(dset)
, blockids(dset)
), regardless of the underlying storage format.
Choosing the appropriate dataset class depends on where your data resides (memory, files) and its format (volumetric, matrix, latent).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.