Description Usage Arguments Details Value Author(s) Examples
The HIRE function provides parameter estimates in the HIRE model and the p-values for the association of CpG sites with one phenotype in each individual cell type.
1 |
Ometh |
The observed methylation matrix (with continuous values between 0 and 1), where one row corresponds to a CpG site and one column represents a sample. |
X |
The observed phenotypes, where one row corresponds to a phenotype and one column represents a sample. |
num_celltype |
The cell type number (an integer) that needs to be specified by the users. |
tol |
The relative tolerance used to determine when the HIRE stops. When the ratio of the log observed-data likelihood difference to the log observed-data likelihood at last iteration in the absolute value is smaller than |
num_iter |
The maximum number of iterations (an integer). The default is 1000. |
alpha |
One threshold parameter in the Bonferroni correction to claim a significant cell-type-specific CpG site (a float point between 0 and 1, usually set less than 0.05). The default is 0.01. |
HIRE used the generalized EM algorithm, so the log observed-data likelihood value increases after each iteration.
HIRE returns a list with seven components.
P_t |
the cellular composition matrix, where one row corresponds to a cell type and one column represents a sample. |
mu_t |
the baseline cell-type-specific methylation profiles, where one row is a CpG site and one column is a cell type. |
beta_t |
the phenotype coefficient array with dimensionality being m (the CpG number) by K (the cell type number) by q (the phenotype number). The first dimension corresponds to CpG sites, the second dimension to cell types, and the third dimension to phenotypes. |
sig_sqErr_t |
the error variance vector withlength equal to the CpG site number. |
sig_sqTiss_t |
the cell-type-specific variance matrix, where one row is a CpG site and one column represents a cell type. |
pBIC |
the penalized BIC value, where the number of phenotype coefficients being zero depends on the parameter |
pvalues |
the p-value matrix, whose dimension is m (the CpG site number) by K*q (the cell type number times the phenotype number). In the p-values matrix, one row is a CpG site. The first K columns correspond to the p-value matrix of the phenotype 1, the second K columns corresponds to the p-value matrix of the phenotype 1, and so forth. |
Xiangyu Luo
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 | ################################################################################################
#Generate the EWAS data
################################################################################################
set.seed(05222018)
#define a function to draw samples from a Dirichlet distribution
rDirichlet <- function(alpha_vec){
num <- length(alpha_vec)
temp <- rgamma(num, shape = alpha_vec, rate = 1)
return(temp / sum(temp))
}
n <- 180 #number of samples
n1 <- 60 #number of controls
n2 <- 120 #number of cases
#################################################################################################
# K=3
#################################################################################################
m <- 2000 #number of CpG sites
K <- 3 #underlying cell type number
#methylation profiles
#assume cell type 1 and cell type 2 are from the same lineage
#cell type 1
methy1 <- rbeta(m,3,6)
#cell type 2
methy2 <- methy1 + rnorm(m, sd=0.01)
ind <- sample(seq_len(m), m/5)
methy2[ind] <- rbeta(length(ind),3,6)
#cell type 3
methy3 <- rbeta(m,3,6)
mu <- cbind(methy1, methy2, methy3)
#number of covariates
p <- 2
#covariates / phenotype
X <- rbind(c(rep(0, n1),rep(1, n2)), runif(n, min=20, max=50))
#set risk-CpG sites under each cell type for each phenotype
beta <- array(0, dim=c(m,K,p))
#control vs case
m_common <- 10
max_signal <- 0.15
min_signal <- 0.07
signs <- sample(c(-1,1), m_common*K, replace=TRUE)
beta[seq_len(m_common),seq_len(K),1] <- signs * runif(m_common*K, min=min_signal, max=max_signal)
m_seperate <- 10
signs <- sample(c(-1,1), m_seperate*2, replace=TRUE)
beta[m_common+(seq_len(m_seperate)),seq_len(2),1] <- signs *
runif(m_seperate*2, min=min_signal, max=max_signal)
signs <- sample(c(-1,1), m_seperate, replace=TRUE)
beta[m_common+m_seperate+(seq_len(m_seperate)),K,1] <- signs *
runif(m_seperate, min=min_signal, max=max_signal)
#age
base <- 20
m_common <- 10
max_signal <- 0.015
min_signal <- 0.007
signs <- sample(c(-1,1), m_common*K, replace=TRUE)
beta[base+seq_len(m_common),seq_len(K),2] <- signs *
runif(m_common*K, min=min_signal, max=max_signal)
m_seperate <- 10
signs <- sample(c(-1,1), m_seperate*2, replace=TRUE)
beta[base+m_common+seq_len(m_seperate),seq_len(2),2] <- signs *
runif(m_seperate*2, min=min_signal, max=max_signal)
signs <- sample(c(-1,1), m_seperate, replace=TRUE)
beta[base+m_common+m_seperate+seq_len(m_seperate),seq_len(K),2] <- signs *
runif(m_seperate, min=min_signal, max=max_signal)
#generate the cellular compositions
P <- vapply(seq_len(n), function(i){
if(X[1,i]==0){ #if control
rDirichlet(c(4,4, 2+X[2,i]/10))
}else{
rDirichlet(c(4,4, 5+X[2,i]/10))
}
}, FUN.VALUE = rep(-1, 3))
#generate the observed methylation profiles
Ometh <- NULL
for(i in seq_len(n)){
utmp <- t(vapply(seq_len(m), function(j){
tmp1 <- colSums(X[ ,i] * t(beta[j, , ]))
rnorm(K,mean=mu[j, ]+tmp1,sd=0.01)
}, FUN.VALUE = rep(-1, K)))
tmp2 <- colSums(P[ ,i] * t(utmp))
Ometh <- cbind(Ometh, tmp2 + rnorm(m, sd = 0.01))
}
sum(Ometh > 1)
Ometh[Ometh > 1] <- 1
sum(Ometh < 0)
Ometh[Ometh < 0] <- 0
################################################################################################
#Apply HIRE to the simulated EWAS data
################################################################################################
#return list by HIRE
ret_list <- HIRE(Ometh, X, num_celltype=K)
#case vs control
#Visualize the association pattern with the case/control status in the first 100 CpG sites
riskCpGpattern(ret_list$pvalues[seq_len(100), c(2,1,3)],
main_title="Detected association pattern\n with disease status", hc_row_ind = FALSE)
#c(2,1,3) was used because of the label switching
#age
#Visualize the association pattern with the age in the first 100 CpG sites
riskCpGpattern(ret_list$pvalues[seq_len(100), K+c(2,1,3)],
main_title="Detected association pattern\n with age", hc_row_ind = FALSE)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.