knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The need for larger samples in human subjects research have led to a trend of aggregating data across multiple locations (sites). This trend is especially prevalent in neuroimaging research. However, while the larger samples promoted greater power to detect significant associations as well as better generalizability of results, multiple-site study designs also introduce heterogeneity in acquisition and processing, which might consequently impact study findings.
ComBat is used as the major harmonization technique in neuroimaging and the ComBat Family further extends the original ComBat methodology to enable flexible covariate modeling, leveraging efficient R implementations of regression models. However, it remains a challenge to evaluate potential batch effects as well as the performance of harmonization. ComBatFamQC
provides a useful visualization tool through Rshiny for interactive batch effect diagnostics before and after harmonization. To streamline the harmonization process and improve efficiency, transparency, and reproducibility, ComBatFamQC
also provides default harmonization methods by integrating the ComBatFamily
package.
The ComBatFamQC
visualization includes three key functions:
visual_prep
: provides all relevant statistical test results for batch effect visualization and evaluation.combat_harm
: provides default harmonization methods from the ComBatFamily package.comfam_shiny
: generate interactive visualization through Rshiny.The ComBatFamQC
includes the following harmonizationmethods:
Install the ComBatFamQC
package and read in the data set to be harmonized. ADNI data is used in the vignette for illustration. To be noticed, the data set should include at least the following columns:
library(ComBatFamQC) library(dplyr) data(adni)
Three types of regression models can be considered for batch effect evaluation:
lm
: Linear regression model, which assumes that the relationship between the variables is linear.gam
: Generalized additive model, which models the relationship between the dependent variable and certain independent variable as a smooth, non-linear function, typically using splines.lmer
: Linear mixed-effects model, which extends the linear regression model to account for both fixed effects and random effects. It is commonly used in longitudinal datasets.(Note: For Windows users, make sure to set cores = 1
in visual_prep
function. The MDMR test can be time-consuming, especially in large datasets. Users have the option to disable the MDMR test by setting mdmr = FALSE
.)
features <- colnames(adni)[c(43:104)] covariates <- c("timedays", "AGE", "SEX", "DIAGNOSIS") interaction <- c("timedays,DIAGNOSIS") batch <- "manufac" result_orig <- visual_prep(type = "lm", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = NULL, df = adni) comfam_shiny(result_orig)
result_gam <- visual_prep(type = "gam", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth_int_type = "linear", smooth = "AGE", df = adni) comfam_shiny(result_gam)
result_lmer <- visual_prep(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = adni) comfam_shiny(result_lmer)
There are two export options: 1) generate a Quarto report (requires Quarto to be installed), and 2) generate a combined Excel file.
#library(quarto) temp_dir <- tempfile() dir.create(temp_dir) diag_save(path = temp_dir, result = result_lmer, use_quarto = TRUE)
diag_save(path = temp_dir, result = result_lmer, use_quarto = FALSE)
There are two types of harmonization scenarios users can choose from:
Specify parameters carefully based on the harmonization method to be applied.
features <- colnames(adni)[c(43:104)] covariates <- c("timedays", "AGE", "SEX", "DIAGNOSIS") interaction <- c("timedays,DIAGNOSIS") batch <- "manufac" ## Harmonize using evaluation results as the inputs combat_model <- combat_harm(result = result_orig, type = "lm", interaction = interaction, smooth = NULL, random = NULL, df = adni) ## Harmonize through specifying features, batch, covariates and df arguments combat_model_copy <- combat_harm(type = "lm", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = NULL, df = adni) ## Expect to get the same harmonization results identical(combat_model$harmonized_df, combat_model_copy$harmonized_df) # save harmonized data write.csv(combat_model$harmonized_df, file.path(temp_dir, "harmonized.csv")) # save combat model saveRDS(combat_model$combat.object, file.path(temp_dir, "combat_model.rds")) # Clean up the temporary file unlink(temp_dir, recursive = TRUE)
## Harmonize using evaluation results as the inputs combat_model_lmer <- combat_harm(result = result_lmer, type = "lmer", interaction = interaction, smooth = NULL, random = "subid", df = adni) ## Harmonize through specifying features, batch, covariates and df arguments combat_model_lmer_copy <- combat_harm(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = adni) ## Expect to get the same harmonization results identical(combat_model_lmer$harmonized_df, combat_model_lmer_copy$harmonized_df)
## Harmonize using evaluation results as the inputs combat_model_gam <- combat_harm(result = result_gam, type = "gam", interaction = interaction, smooth = "AGE", smooth_int_type = "linear", df = adni) ## Harmonize through specifying features, batch, covariates and df arguments combat_model_gam_copy <- combat_harm(type = "gam", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = "AGE", smooth_int_type = "linear", df = adni) ## Expect to get the same harmonization results identical(combat_model_gam$harmonized_df, combat_model_gam_copy$harmonized_df)
## Harmonize using evaluation results as the inputs covbat_model <- combat_harm(result = result_gam, type = "gam", interaction = interaction, smooth = "AGE", smooth_int_type = "linear", df = adni, family = "covfam") ## Harmonize through specifying features, batch, covariates and df arguments covbat_model_copy <- combat_harm(type = "gam", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth_int_type = "linear", smooth = "AGE", df = adni, family = "covfam") ## Expect to get the same harmonization results identical(covbat_model$harmonized_df, covbat_model_copy$harmonized_df)
Specify predict
parameter to be TRUE and object
parameter to be saved ComBat model.
saved_model <- combat_model_gam$combat.object harm_predict <- combat_harm(df = adni %>% head(1000), predict = TRUE, object = saved_model)
Specify reference
parameter to be saved reference data. To be noticed, the reference data should have identical columns as the new data and the new data should contain reference data as its sub sample.
# harmonize reference data reference_site <- adni %>% group_by(site) %>% summarize(count = n()) %>% arrange(desc(count)) %>% pull(site) %>% head(30) reference_df <- adni %>% filter(site %in% reference_site) features <- colnames(reference_df)[c(43:104)] covariates <- c("timedays", "AGE", "SEX", "DIAGNOSIS") interaction <- c("timedays,DIAGNOSIS") batch <- "site" ref_model <- combat_harm(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = reference_df) # harmonize new data to the reference data harm_new <- combat_harm(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = adni, reference = ref_model$harmonized_df)
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.