knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This document shows how to analyze genetic data with the Digital Twin Test (DTT) using the digitaltwins
package.
library(digitaltwins)
First, we create a synthetic data set which has two parts:
#simulation parmaters n_ext <- 1000 #number of external observations n_trio <- 250 #number of trios p <- 500 #number of observed genetic variants chrome_width <- 50 #length of chromosome d <- rep(p / chrome_width, p) #genetic distance between sites k <- 5 #number of causal variants
We will consider a single chromosome of length r chrome_width
Mb (roughly the length of chromosome 22),
and take r p
genetic variants equally spaced across the chromosome. For simplicitly, we assume recombination occurs uniformly at random.
We begin with a matrix set of simulated haplotypes included in the digitaltwins
package.
data("haps_matrix")
We take the first r 2*n_ext
rows of haps_matrix
as the haplotypes of the r n_ext
unrelated individuals (each individual has two corresponding haplotypes).
external_haps <- haps_matrix[1:(2*n_ext), ] dim(external_haps)
Next, we use the remaining r 4 * n_trio
rows as the haplotypes of the parents for the r n_trio
trios.
parent_haps <- haps_matrix[(2*n_ext + 1):(2*n_ext + 4 * n_trio), ] dim(parent_haps)
We create the table of ancestries for the offspring haplotypes, and simulate
the offspring haplotypes with the digitaltwins::generate_offspring
function.
anc <- matrix(1:(4*n_trio), nrow = 2*n_trio, byrow = TRUE) #index of ancestors of haplotypes print(dim(anc)) head(anc)
set.seed(300) # use the "generate_offspring" function for each row of the ancestry table offspring_haps <- mapply( function(i, j) {digitaltwins::generate_offspring(parent_haps[i, ], parent_haps[i, ], d = d)}, anc[, 1], anc[, 2]) dim(offspring_haps)
Lastly, we will simulate a response variable from a sparse linear regression model with r k
causal variants.
The digitaltwins::haps_to_gen
function
converts the matrix of haplotypes to a matrix of genotypes by adding rows 1 and 2, rows 3 and 4, etc.
#create the regression coefficients beta <- rep(0, p) causal_variants <- sample(1:p, k) beta[causal_variants] <- 1 #sample the response variable from a sparse linear model Y_ext <- digitaltwins::haps_to_gen(external_haps) %*% beta + rnorm(n_ext) Y_offspring <- digitaltwins::haps_to_gen(offspring_haps) %*% beta + rnorm(n_trio)
We now have a population of unrelated haplotypes and parent-offspring trios, and so we will next turn to the Digital Twin Test.
The first step of the Digital Twin Test is model fitting. The model fitting is done on the
external data set of size r n_ext
. This can be done using any
fitting software that yields a linear predictor. We will use the glmnet
package for sparse linear regression, tuning the model with cross-validation.
library(glmnet)
lasso_fit <- cv.glmnet(digitaltwins::haps_to_gen(external_haps), Y_ext) beta_hat <- coef(lasso_fit) length(beta_hat) #Fitted linear predictor. First entry is an intercept.
Next, we perform inference using the trio data. While our test uses a linear predictor, we emphasize that the validity of the test does not rely on the correctness of our model whatsoever. A better model fit will lead to higher power, but inference remains valid no matter the quality of the model fit.
We will take a group that contains a causal variant.
test_region <- 1:100 sum(causal_variants %in% test_region) #number of causal variants in the test region
p_value <- digitaltwins::linear_crt(offspring_haps, parent_haps, anc, Y_offspring, matrix(beta_hat, ncol = 1), group = test_region, d = d, family = "gaussian") #null p-value p_value
We find that the p-value for this region is significant.
If we instead test a region that does not contain a causal variant, we will find a p-value that is not siginificant.
test_region <- 400:500 sum(causal_variants %in% test_region) #number of causal variants in the test region
p_value <- digitaltwins::linear_crt(offspring_haps, parent_haps, anc, Y_offspring, matrix(beta_hat, ncol = 1), group = test_region, d = d, family = "gaussian") #non-null p-value p_value
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.