Description Details Author(s) References Examples
Here we propose a model to systematically analyze the roles of architectural proteins and functional elements in blocking long-range contacts between loci. The proposed model does not rely on topologically associating domain (TAD) mapping from Hi-C data. Instead of testing the enrichment or influence of protein binding at TAD borders, the model directly estimates the blocking effect of proteins on long-range contacts between flanking loci, making the model intuitive and biologically meaningful.
# To install dependencies:
source("https://bioconductor.org/biocLite.R")
biocLite("IRanges","GenomicRanges","GenomeInfoDb","rtracklayer","HiTC")
install.packages(c("methods","MASS","Matrix","glmnet","S4Vectors"))
# To install HiCblock package:
install.packages("HiCblock")
# To use the package, there are two steps:
- To assess the blocking effects of features such as architectural proteins, one should first annotate Hi-C matrix with bias data (GC-content, mappability, fragment length) and with feature data (for instance, ChIP-seq data) with the function HiCblockProcData(). Note that if the Hi-C matrix has already been corrected for biases, then no bias data need to be used.
- Then, one should compute the negative binomial or Poisson lasso regression with the function HiCblockModel() using the output from HiCblockProcData().
The blocking effect of a protein (or motif) is the associated beta from the regression.
# Choice between Poisson lasso over binomial negative regressions:
Because of Hi-C count overdispersion, we used negative binomial regression as the most appropriate specification of the generalized linear model. However, Poisson regression with lasso shrinkage can also be used. We believe that the choice between both depends mainly on the number of variables to analyze. On the one hand, if there are a few candidate variables (less than 10), it is interesting to estimate beta parameters together with corresponding p-values to assess significance using negative binomial regression. On the other hand, if there are a large number of variables (10 or more), it is more convenient to use Poisson lasso regression in order to select the key variables and to account for correlations among the variables (frequent in ChIP-seq and motif occurrence data).
Raphael Mourad
Maintainer: Raphael Mourad, raphael.mourad@ibcg.biotoul.fr
Raphael Mourad and Olivier Cuvier. TAD-free analysis of architectural proteins and insulators. Nucleic Acids Research, Volume 46, Issue 5, 16 March 2018, Pages e27.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 | # Load data
# The Hi-C matrix is at 20kb resolution (low resolution only for example)
data(dataExample)
genomicFeatureList.GR=dataExample$GenomicFeatureList.GR
annotNames=dataExample$AnnotNames
HTCList=dataExample$HTC
distInter=c(100e3,140e3)
IBP=c("BEAF32","dCTCF","dTFIIIC","GAF","SuHw")
# Annotate Hi-C data with genomic features
HRPD=HiCblockProcData(genomicFeatureList.GR, annotNames, HTCList, distInter,verbose=TRUE)
# Model 1
modelBlock1=as.formula(paste0("Count~logDist+len+GC+map+I(-BEAF32)"))
HRM_Block1=HiCblockModel(HRPD,modelBlock1,"BEAF32",regressionMode="NB")
print(HRM_Block1)
# Output from model 1
# Blocking effect (beta) of BEAF-32 is 0.21
#Call:
#glm.nb(formula = model, data = dataGLM, init.theta = 2.956475677,
# link = log)
#Deviance Residuals:
# Min 1Q Median 3Q Max
#-3.9346 -0.9018 -0.2606 0.4158 5.0515
#Coefficients:
# Estimate Std. Error z value Pr(>|z|)
#(Intercept) -4.420963 1.102393 -4.010 6.06e-05 ***
#logDist -1.226667 0.073070 -16.788 < 2e-16 ***
#len 0.662864 0.027635 23.987 < 2e-16 ***
#GC -0.627353 0.087941 -7.134 9.76e-13 ***
#map 1.258398 0.054150 23.239 < 2e-16 ***
#I(-BEAF32) 0.206394 0.007735 26.683 < 2e-16 ***
#---
#Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#(Dispersion parameter for Negative Binomial(2.9565) family taken to be 1)
# Null deviance: 5377.2 on 3375 degrees of freedom
#Residual deviance: 3575.1 on 3370 degrees of freedom
#AIC: 46123
#Number of Fisher Scoring iterations: 1
# Theta: 2.9565
# Std. Err.: 0.0693
# 2 x log-likelihood: -46109.2770
# Model 2
# Blocking effect (beta) of BEAF-32 is 0.18
if(FALSE){
vars2=paste(paste0("I(-",IBP,")"),collapse='+')
modelBlock2=as.formula(paste0("Count~logDist+len+GC+map+",vars2))
HRM_Block2=HiCblockModel(HRPD,modelBlock2,IBP,regressionMode="NB")
print(HRM_Block2)
}
# Output from model 2
#Call:
#glm.nb(formula = model, data = dataGLM, init.theta = 3.138426428,
# link = log)
#Deviance Residuals:
# Min 1Q Median 3Q Max
#-3.7727 -0.8815 -0.2809 0.4282 4.3644
#Coefficients:
# Estimate Std. Error z value Pr(>|z|)
#(Intercept) -4.982498 1.074279 -4.638 3.52e-06 ***
#logDist -1.236704 0.070946 -17.432 < 2e-16 ***
#len 0.685146 0.027209 25.181 < 2e-16 ***
#GC -0.560153 0.086482 -6.477 9.35e-11 ***
#map 1.310488 0.053692 24.407 < 2e-16 ***
#I(-BEAF32) 0.176687 0.008283 21.332 < 2e-16 ***
#I(-dCTCF) 0.028798 0.013665 2.107 0.03508 *
#I(-dTFIIIC) 0.621412 0.077602 8.008 1.17e-15 ***
#I(-GAF) 0.047998 0.007878 6.092 1.11e-09 ***
#I(-SuHw) 0.072596 0.026326 2.758 0.00582 **
#---
#Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#(Dispersion parameter for Negative Binomial(3.1384) family taken to be 1)
# Null deviance: 5702.9 on 3375 degrees of freedom
#Residual deviance: 3564.8 on 3366 degrees of freedom
#AIC: 45911
#Number of Fisher Scoring iterations: 1
# Theta: 3.1384
# Std. Err.: 0.0738
# 2 x log-likelihood: -45889.1960
# Model 3
# Blocking effect (beta) of BEAF-32 is 0.22
if(FALSE){
HRM_Block3=HiCblockModel(HRPD,NULL,IBP,regressionMode="PoissonLasso")
print(HRM_Block3)
}
# Output from model 3
# Variable Coefficient
#logDist logDist -1.04649
#len len 0.56508
#GC GC -0.43621
#map map 0.95623
#BEAF32 BEAF32 0.22077
#dCTCF dCTCF 0.01680
#dTFIIIC dTFIIIC 0.73750
#GAF GAF 0.03878
#SuHw SuHw 0.12617
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.