README.md

ComBat harmonization in R

Software status

| Resource: | Travis CI | | ----------- | ---------------------------------------------------------------------------------------------------------------------------------------------------- | | Platform: | Linux | | R CMD check | Build status |

Table of content

## 1. Installation neuroCombat can be installed in R by typing the following commands: wzxhzdk:0
## 2. Multi-Site Harmonization ComBat estimates scanner-specific location and scale parameters, for each feature separately, and pools information across features using empirical Bayes to improve the estimation of those parameters for small sample size studies. ### 2.1 Full ComBat with empirical Bayes The `neuroCombat` function is the main function. It requires two mandatory arguments: - a data matrix (p x n) `dat` for which the p rows are features, and the n columns are participants. - a numeric or character vector `batch` of length n indicating the site/scanner/study id. For illustration purpose, let's simulate an imaging dataset with n=10 participants, acquired on 2 scanners, with 5 participants each, with p=10000 voxels per scan. wzxhzdk:1 We use the function `neuroCombat` to harmonize the data across the 2 scanners: wzxhzdk:2 By default, this uses parametric adjustments. To following command must be used for non-parametric adjustments: wzxhzdk:3 The harmonized matrix is stored in `data.harmonized$dat.combat`. The `data.harmonized` object also contains the different parameters estimated by ComBat: - `gamma.hat` and `delta.hat`: Estimated location and shift (L/S) parameters before empirical Bayes. - `gamma.star` and `delta.star`: Empirical Bayes estimated L/S parameters. - `gamma.bar`, `t2`, `a.prior` and `b.prior`: esimated prior distributions parameters. `neuroCombat` also accepts an optional argument, `mod`, which is a matrix containing biological covariates, including the outcome of interest. This is recommended to ensure that biological variability is preserved in the harmonization process. For instance, for a study with age and disease covariates, wzxhzdk:4 we first create a model matrix for these two biological covariates using the `model.matrix` function: wzxhzdk:5 The matrix `mod` is a n x 3 matrix, containing an intercept, age and a dummy variable for the second level of the disease variable (the first level is taken as the baseline group). Note that including an intercept in the model matrix will not change the results of the algorithm; ComBat automatically removes the intercept from the model matrix when fitting the models. We now harmonize the data: wzxhzdk:6 ### 2.2 ComBat without empirical Bayes Sometimes, it is preferable not to pool information across features, for instance if: - (1) The number of features is substantially smaller than the number of participants (p << n) or - (2) The prior distributions used in ComBat do not fit well the data - (3) The site effects are only present for a small subset of features An example of (2) is studies with site/scanner effects that are highly heteregenous across features, for instance differential scanner effects between white matter (WM) or grey matter (GM) voxels exist. To run the ComBat model without empirical Bayes, which boils down to fitting a location/shift (L/S) model for each feature separately, the option `eb=FALSE` can be used: wzxhzdk:7 ### 2.3 ComBat with mean site effects adjustment only (no variance adjustment)
## 3. Visualization Coming soon.


Jfortin1/neuroCombat_Rpackage documentation built on Nov. 24, 2024, 9:23 a.m.