setup_glm_pipeline: Main worker function for setting up an analysis pipeline

View source: R/setup_glm_pipeline.R

setup_glm_pipelineR Documentation

Main worker function for setting up an analysis pipeline

Description

Main worker function for setting up an analysis pipeline

Usage

setup_glm_pipeline(
  analysis_name = "glm_analysis",
  scheduler = "slurm",
  output_directory = file.path(getwd(), analysis_name),
  subject_data = NULL,
  run_data = NULL,
  trial_data = NULL,
  group_output_directory = "default",
  output_locations = "default",
  vm = c(id = "id", session = "session", run_number = "run_number", block_number =
    "block_number", trial = "trial", run_trial = "run_trial", subtrial = "subtrial",
    mr_dir = "mr_dir", run_nifti = "run_nifti", exclude_subject = "exclude_subject"),
  bad_ids = NULL,
  tr = NULL,
  fmri_file_regex = ".*\\.nii(\\.gz)?",
  fmri_path_regex = NULL,
  run_number_regex = ".*run-*([0-9]+).*",
  drop_volumes = 0L,
  l1_models = "prompt",
  l2_models = "prompt",
  l3_models = "prompt",
  glm_software = "fsl",
  n_expected_runs = 1L,
  use_preconvolve = TRUE,
  glm_settings = "default",
  confound_settings = list(motion_params_file = "motion.par", motion_params_colnames =
    c("rx", "ry", "rz", "tx", "ty", "tz"), confound_input_file = NULL,
    confound_input_colnames = NULL, l1_confound_regressors = NULL, exclude_run =
    "mean(FD) > 0.9 | max(FD) > 0.5", truncate_run = NULL, exclude_subject = NULL,
    spike_volumes = "FD > 0.9"),
  parallel = list(l1_setup_cores = 4L, pipeline_cores = "default"),
  additional = list(feat_l1_args = list(z_thresh = 1.96, prob_thresh = 0.05)),
  lgr_threshold = "info"
)

Arguments

analysis_name

A character string providing a useful name for identifying this analysis. Practically, this influences the top-level folder name of the group analysis outputs, as well as the name of .RData objects saved by this function to the output directory for the analysis.

scheduler

Which HPC scheduler system should be used for queueing jobs. Options are 'slurm', 'torque', or 'local'.

output_directory

The output directory for all configuration and job submission files for this analysis. Default is a subfolder called "glm_out" in the current working directory.

subject_data

A data.frame containing all subject-level data such as age, sex, or other covariates. Columns from subject_data can be used as covariates in group (aka 'level 3') analyses. If NULL, then this will be distilled from trial_data by looking for variables that vary at the same level as vm["id"].

run_data

A data.frame containing all run-level data such as run condition or run number. Columns from run_data can be used as covariates in subject (aka 'level 2') analyses. If NULL, this will be distilled from trial_data by looking for variables that vary at the same level as vm["run_number"].

trial_data

A data.frame containing trial-level statistics for all subjects. Data should be stacked in long form such that each row represents a single trial for a given subject and the number of total rows is subjects x trials. If you wish, you can pass a single trial-level data frame that also contains all run-level and subject-level covariates (i.e., a combined long format, where variables at different levels are all included as columns). In this case, setup_glm_pipeline will detect which variables occur at each level and parse these accordingly into subject_data and run_data.

vm

A named character vector containing key identifying columns in subject_data and trial_data. Minimally, this vector should contain the elements 'id'

bad_ids

An optional vector of ids in subject_data and trial_data that should be excluded from analysis.

tr

The repetition time of the scanning sequence in seconds. Used for setting up design matrices. If this is NULL, the function will look for a tr field in the run_data object, which specifies TR at run level (e.g., if varies).

fmri_file_regex

A character string containing a Perl-compatible regular expression for the subfolder and filename within the mr_dir field in subject_data.

drop_volumes

The number of volumes to drop from the fMRI data and convolved regressors prior to analysis. Default is 0.

l1_models

An l1_model_set object containing all level 1 (run) models to be included in GLM pipeline. If "prompt" is passed, setup_glm_pipeline will call build_l1_models to setup l1 models interactively. Optionally, this argument can be NULL if you want to setup l1 models later, though the resulting object will not be functional within the pipeline until l1 models are provided.

l2_models

An hi_model_set object containing all level 2 (subject) models to be included in GLM pipeline. If "prompt" is passed, setup_glm_pipeline will call build_l2_models to setup l2 models interactively. Optionally, this argument can be NULL if you want to setup l2 models later, though the resulting object will not be functional within the pipeline until l2 models are provided.

l3_models

An hi_model_set object containing all level 3 (sample) models to be included in GLM pipeline. If "prompt" is passed, setup_glm_pipeline will call build_l3_models to setup l3 models interactively. Optionally, this argument can be NULL if you want to setup l3 models later, though the resulting object will not be functional within the pipeline until l3 models are provided.

glm_software

Which fMRI analysis package to use in the analysis. Options are "FSL", "SPM", or "AFNI" (case insensitive).

n_expected_runs

Number of expected runs per subject. Used to determine 2- versus 3-level analysis (for FSL), and for providing feedback about subjects who have unexpected numbers of runs.

use_preconvolve

A boolean indicating whether to enter convolved regressors into the GLM estimation software (e.g., FSL FILM/FEAT). If TRUE, all regressors will be generated by build_design_matrix. If FALSE, onset-duration-value timing will be entered and convolution will be handled internally by the GLM software. I recommend TRUE for consistency.

additional

A list of additional metadata that will be added to the glm.pipeline object returned by the function. This can be useful if there are other identifiers that you want for long-term storage or off-shoot functions.

lgr_threshold

The logging threshold used to determine whether to output messages of different severity to the screen and to log files. Default is "info", which produces all messages, warnings, and errors, but not debug or trace statements. To output only concerning errors, change to "error". See: https://s-fleck.github.io/lgr/articles/lgr.html for details.

block_data

An optional data.frame containing information about design features in the task that vary at block level (typically, longer periods of time such as 10-30s). Blocks are superordinate to trials and subordinate to runs.

mr_dir_column

A character string indicating the column name in subject_data containing the folder for each subject's data. Default is "mr_dir".


UNCDEPENdLab/fmri.pipeline documentation built on April 3, 2025, 3:21 p.m.