data-raw/LM_Refactor.md

This is an excellent set of guidelines for code hygiene, modularity, and maintainability. It's exactly what's needed to ensure fmrireg remains robust and developer-friendly as it grows. The "TL;DR for Developers" is a perfect summary.

Let's integrate these principles into a revised, comprehensive proposal and ticketed sprint. The existing "Phase 1, 2, 3" structure will be maintained, but the new ARCH tickets will be prioritized as foundational.

Project: Integrated Robust & AR(p) Modeling with Architectural Refinement (Version 4.0)

Goal: Deliver a robust, efficient, user-friendly, and maintainable implementation of fMRI linear modeling in fmrireg. This version focuses on integrating Iteratively Reweighted Least Squares (IRLS) with Autoregressive (AR(p)) modeling, underpinned by a modular and clean codebase.

Core Design & Architectural Principles:

  1. Primary Fitting Sequence ("Whiten then Robustly Weight"):
    • Optional: Regress out extra_nuisance regressors.
    • Initial OLS/GLS to estimate AR parameters (phi_hat).
    • AR Pre-whitening of data (Y) and design (X).
    • IRLS on the whitened data (Y_w, X_w).
    • Optional: Re-estimate phi_hat and perform a final weighted GLS.
  2. Modularity (Slice by Responsibility):
    • R/fmri_lm_config.R: Configuration object (fmri_lm_config) creation and validation.
    • R/fmri_lm_context.R: GLM context object (glm_context) definition.
    • R/fmri_lm_solver.R: Core GLM solver (solve_glm_core) for OLS/WLS.
    • R/fmri_ar_modeling.R: AR parameter estimation (estimate_ar_parameters) and data whitening (ar_whiten_transform).
    • R/fmri_robust_fitting.R: IRLS engine (robust_iterative_fitter) using solve_glm_core and ar_whiten_transform.
    • R/fmri_lm_orchestrators.R: runwise_fitter and chunkwise_fitter orchestrating the steps.
    • R/fmrilm.R: Top-level fmri_lm and fmri_lm_fit functions.
  3. Minimized Surface Area: Use fmri_lm_config for options and glm_context for data transfer between modules.
  4. Single Source of Truth for Math: Centralize core matrix operations in solve_glm_core.
  5. CI Guardrails: Implement lintr, styler, and code size checks.
  6. Progressive Disclosure in Docs: Clear separation of user API and internal engine documentation.
  7. Encapsulated Configuration: fmri_lm_config object.
  8. Semantic Tests: Small, focused tests per module and integration tests.
  9. Centralized Error Handling: Utility functions for common validation/error messages.

API Changes (fmri_lm):

Immediate "Must Fix" Items (Pre-Sprint):

Ticketed Sprint: Integrated Robust & AR(p) Modeling with Architectural Refinement

Phase 0: Architectural Foundation (Blockers for subsequent work)

Phase 1: Fast Path OLS/GLS and Robust-Only (No Combined AR+Robust Yet)

Phase 2: Combined AR + Robust Fast Paths & Advanced AR

Phase 3: Final Touches, Documentation & Testing

This revised plan heavily emphasizes the architectural changes first (Phase 0), then builds the fast paths incrementally (Robust-only, then AR-only, then combined AR+Robust). The chunkwise AR+Robust path remains the most intricate. The API is simplified by grouping options into fmri_lm_control. The "must-fix" items are critical prerequisites.

Phase 4: Voxelwise AR Contrast Support

Problem Statement: The current voxelwise AR implementation in the slow path (SPRINT3-05R) does not compute contrasts. It returns an empty contrast list with a comment "would need proper handling of contrasts". This is a critical gap that prevents users from performing hypothesis testing when using voxelwise AR modeling.

Technical Challenges: 1. Each voxel has different AR parameters, leading to different whitening transformations 2. The (X'X)^-1 matrix differs for each voxel after whitening 3. Standard errors must account for voxel-specific whitening 4. Memory efficiency is crucial when storing per-voxel covariance matrices

Proposed Solution:

Alternative Approach (if memory is critical):

Instead of storing XtXinv for each voxel, we could: 1. Compute a "reference" XtXinv using average AR parameters 2. Store only the deviation of each voxel's XtXinv from reference 3. Use perturbation theory to approximate voxel-specific standard errors

This would trade some accuracy for substantial memory savings.

Phase 5: Code Modularization and Cleanup

Problem Statement: The fmrilm.R file has grown to over 2000 lines with mixed responsibilities, duplicated code, and strategy implementations that are nearly 1000 lines each. This violates the modularity principles and makes the code hard to maintain.

Expected Outcome: - fmrilm.R reduced from 2000+ lines to ~400 lines (just core API) - Clear module structure: - fmri_model_utils.R (~200 lines) - fmri_lm_methods.R (~300 lines) - fmri_lm_strategies.R (~400 lines) - fmri_lm_runwise.R (~300 lines) - fmri_lm_chunkwise.R (~400 lines) - fmri_lm_internal.R (~150 lines) - Easier to maintain, test, and extend - Follows single-responsibility principle



bbuchsbaum/fmrireg documentation built on June 10, 2025, 8:18 p.m.