knitr::opts_chunk$set( collapse = TRUE, comment = "#>", dpi = 80 )
ColocBoost provides a flexible interface for individual-level colocalization analysis across multiple formats. We recommend using individual level genotype and phenotype data when available, to gain both sensitivity and precision compared to summary statistics-based approaches.
This vignette demonstrates how to perform multi-trait colocalization analysis using individual level data in ColocBoost,
specifically focusing on the Ind_5traits
dataset included in the package.
library(colocboost)
Ind_5traits
DatasetThe Ind_5traits
dataset contains 5 simulated phenotypes alongside corresponding genotype matrices.
The dataset is specifically designed to evaluate and demonstrate the capabilities of ColocBoost in multi-trait colocalization analysis with individual-level data.
X
: A list of genotype matrices for different outcomes. Y
: A list of phenotype vectors for different outcomes. true_effect_variants
: True effect variants indices for each trait.The dataset features two causal variants with indices 194 and 589.
This structure creates a realistic scenario where multiple traits are influenced by different but overlapping sets of genetic variants.
# Loading the Dataset data(Ind_5traits) names(Ind_5traits) Ind_5traits$true_effect_variants
Due to the file size limitation of CRAN release, this is a subset of simulated data. See full dataset in colocboost paper repo.
The preferred format for colocalization analysis in ColocBoost using individual level data is where genotype ($X$) and phenotype ($Y$) data are properly matched.
X
and Y
are organized as lists, matched by trait index,(X[1], Y[1])
contains individual level data for trait 1,(X[2], Y[2])
contains individual level data for trait 2,This function requires specifying genotypes X
and phenotypes Y
from the dataset:
# Extract genotype (X) and phenotype (Y) data X <- Ind_5traits$X Y <- Ind_5traits$Y # Run colocboost with matched data res <- colocboost(X = X, Y = Y) # Identified CoS res$cos_details$cos$cos_index # Plotting the results colocboost_plot(res)
For comprehensive tutorials on result interpretation and advanced visualization techniques, please visit our tutorials portal at Visualization of ColocBoost Results and Interpret ColocBoost Output.
When studying multiple traits with a common genotype matrix, such as gene expression in different tissues or cell types, we provide the interface for one single genotype matrix with multiple phenotypes. This is particularly useful when the same individuals are used for different traits, allowing for efficient analysis without redundancy.
X
is a single matrix containing genotype data for all individuals. Y
can be i) a matrix with $N \times L$ dimension; ii) a list of phenotype vectors for $L$ traits.# Extract a single SNP (as a vector) X_single <- X[[1]] # First SNP for all individuals # Run colocboost res <- colocboost(X = X_single, Y = Y) # Identified CoS res$cos_details$cos$cos_index
When the genotype matrix includes a superset of individuals across different phenotypes, with Input Format:
X
is a matrix of genotype data for all individuals.Y
is a list of phenotype vectors for different traits.X
and Y
should be provided to match individuals - same format of individual id.X
contain all individuals present in the phenotype vectors (optional).# Create phenotype with different samples - remove 50 samples trait 1 and trait 3. X_superset <- X[[1]] Y_remove <- Y Y_remove[[1]] <- Y[[1]][-sample(1:length(Y[[1]]),50), , drop=F] Y_remove[[3]] <- Y[[3]][-sample(1:length(Y[[3]]),50), , drop=F] # Run colocboost res <- colocboost(X = X_superset, Y = Y_remove) # Identified CoS res$cos_details$cos$cos_index
When studying multiple traits with arbitrary genotype matrices for different traits, we also provide the interface for arbitrary genotype matrices with multiple phenotypes. This particularly benefits meta-analysis across heterogeneous datasets where, for different subsets of traits, genotype data comes from different genotyping platforms or sequencing technologies.
X = list(X1, X3)
is a list of genotype matrices.Y = list(Y1, Y2, Y3, Y4, Y5)
is a list of phenotype vectors, where traits 1 and 2 matched to the 1st genotype matrix X1
;
traits 3,4,5 matched to 2nd genotype matrix X3
.dict_YX
is a dictionary matrix that index of Y to index of X.# Create a simple dictionary for demonstration purposes X_arbitrary <- X[c(1,3)] dict_YX = cbind(c(1:5), c(1,1,2,2,2)) # Display the dictionary dict_YX # Run colocboost res <- colocboost(X = X_arbitrary, Y = Y, dict_YX = dict_YX) # Identified CoS res$cos_details$cos$cos_index
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.