longCombat: Harmonize Multi-batch Longitudinal Data

View source: R/longCombat.R

longCombatR Documentation

Harmonize Multi-batch Longitudinal Data

Description

longCombat function will implement longitudinal ComBat harmonization for multi-batch longitudinal data. Longitudinal ComBat uses an empirical Bayes method to harmonize means and variances of the residuals across batches in a linear mixed effects model framework. Detailed methods are described in the manuscript at https://www.biorxiv.org/content/10.1101/868810v4. This is a modification of the ComBat function code from the sva package that can be found at https://bioconductor.org/packages/release/bioc/html/sva.html and combat.R that can be found at https://github.com/Jfortin1/ComBatHarmonization. Data should be in "long" format. Depends on lme4 package.

Usage

longCombat(
  idvar,
  timevar,
  batchvar,
  features,
  formula,
  ranef,
  data,
  niter = 30,
  method = "REML",
  verbose = TRUE
)

Arguments

idvar

character string that specifies name of ID variable. ID variable can be factor, numeric, or character.

timevar

character string that specifies name of numeric variable that distinguishes within-subject repeated measures, e.g., time, age, or visit.

batchvar

character string that specifies name of the batch variable. Batch variable should be a factor.

features

character string that specifies names of the numeric feature variables, or the numeric indices of the corresponding columns.

formula

character string representing all fixed effects on the right side of the formula for the linear mixed effects model. This should be in the notation used by lme4 and include covariates, time, and any interactions. For example, "age + sex + diagnosis*time" fits model with fixed effects age, sex, diagnosis, time, and the diagnosis*time interaction. Formula should NOT include batchvar and should NOT include random effects.

ranef

character string representing formula for the random effects in the notation used by lme4. For example, "(1|subid)" fits a random intercept for each unique idvar subid, and "(1 + time|subid)" fits a random intercept and random slope for each unique subid.

data

name of the data frame that contains the variables above. Rows are different observations (subject/timepoints), columns are different variables.

niter

number of iterations for empirical Bayes step. Usually converges quickly in less than 30 iterations. Default is 30.

method

method for estimating sigma in standardization step (character string). 'REML' (default, more conservative type I error control) or 'MSR' (more powerful, less conservative type I error control).

verbose

prints messages. Logical TRUE or FALSE. Default is TRUE.

Value

Function outputs a list including the following:

data_combat

data frame with columns idvar, timevar, and ComBat-harmonized data for each feature

gammahat

data frame containing mean of standardized data for each batch (row) and feature (column)

delta2hat

data frame containing variance of standardized data for each batch (row) and feature (column)

gammastarhat

data frame containing empirical Bayes estimate of additive batch effects

delta2starhat

data frame containing empirical Bayes estimate of multiplicative batch effects


jcbeer/longCombat documentation built on June 26, 2022, 6:47 p.m.