HiCblock-package: Systematic Analysis of Architectural Proteins and Functional...

Description Details Author(s) References Examples

Description

Here we propose a model to systematically analyze the roles of architectural proteins and functional elements in blocking long-range contacts between loci. The proposed model does not rely on topologically associating domain (TAD) mapping from Hi-C data. Instead of testing the enrichment or influence of protein binding at TAD borders, the model directly estimates the blocking effect of proteins on long-range contacts between flanking loci, making the model intuitive and biologically meaningful.

Details

# To install dependencies:
source("https://bioconductor.org/biocLite.R")
biocLite("IRanges","GenomicRanges","GenomeInfoDb","rtracklayer","HiTC")
install.packages(c("methods","MASS","Matrix","glmnet","S4Vectors"))

# To install HiCblock package:
install.packages("HiCblock")

# To use the package, there are two steps:
- To assess the blocking effects of features such as architectural proteins, one should first annotate Hi-C matrix with bias data (GC-content, mappability, fragment length) and with feature data (for instance, ChIP-seq data) with the function HiCblockProcData(). Note that if the Hi-C matrix has already been corrected for biases, then no bias data need to be used.
- Then, one should compute the negative binomial or Poisson lasso regression with the function HiCblockModel() using the output from HiCblockProcData().
The blocking effect of a protein (or motif) is the associated beta from the regression.

# Choice between Poisson lasso over binomial negative regressions:
Because of Hi-C count overdispersion, we used negative binomial regression as the most appropriate specification of the generalized linear model. However, Poisson regression with lasso shrinkage can also be used. We believe that the choice between both depends mainly on the number of variables to analyze. On the one hand, if there are a few candidate variables (less than 10), it is interesting to estimate beta parameters together with corresponding p-values to assess significance using negative binomial regression. On the other hand, if there are a large number of variables (10 or more), it is more convenient to use Poisson lasso regression in order to select the key variables and to account for correlations among the variables (frequent in ChIP-seq and motif occurrence data).

Author(s)

Raphael Mourad

Maintainer: Raphael Mourad, [email protected]

References

Raphael Mourad and Olivier Cuvier. TAD-free analysis of architectural proteins and insulators. Nucleic Acids Research, Volume 46, Issue 5, 16 March 2018, Pages e27.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
# Load data
# The Hi-C matrix is at 20kb resolution (low resolution only for example)
data(dataExample)
genomicFeatureList.GR=dataExample$GenomicFeatureList.GR
annotNames=dataExample$AnnotNames
HTCList=dataExample$HTC
distInter=c(100e3,140e3)
IBP=c("BEAF32","dCTCF","dTFIIIC","GAF","SuHw")

# Annotate Hi-C data with genomic features
HRPD=HiCblockProcData(genomicFeatureList.GR, annotNames, HTCList, distInter,verbose=TRUE)

# Model 1
modelBlock1=as.formula(paste0("Count~logDist+len+GC+map+I(-BEAF32)"))
HRM_Block1=HiCblockModel(HRPD,modelBlock1,"BEAF32",regressionMode="NB")
print(HRM_Block1)

# Output from model 1
# Blocking effect (beta) of BEAF-32 is 0.21
#Call:
#glm.nb(formula = model, data = dataGLM, init.theta = 2.956475677, 
#    link = log)

#Deviance Residuals: 
#    Min       1Q   Median       3Q      Max  
#-3.9346  -0.9018  -0.2606   0.4158   5.0515  

#Coefficients:
#             Estimate Std. Error z value Pr(>|z|)    
#(Intercept) -4.420963   1.102393  -4.010 6.06e-05 ***
#logDist     -1.226667   0.073070 -16.788  < 2e-16 ***
#len          0.662864   0.027635  23.987  < 2e-16 ***
#GC          -0.627353   0.087941  -7.134 9.76e-13 ***
#map          1.258398   0.054150  23.239  < 2e-16 ***
#I(-BEAF32)   0.206394   0.007735  26.683  < 2e-16 ***
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#(Dispersion parameter for Negative Binomial(2.9565) family taken to be 1)

#    Null deviance: 5377.2  on 3375  degrees of freedom
#Residual deviance: 3575.1  on 3370  degrees of freedom
#AIC: 46123

#Number of Fisher Scoring iterations: 1


#              Theta:  2.9565 
#          Std. Err.:  0.0693 

# 2 x log-likelihood:  -46109.2770 



# Model 2
# Blocking effect (beta) of BEAF-32 is 0.18
vars2=paste(paste0("I(-",IBP,")"),collapse='+')
modelBlock2=as.formula(paste0("Count~logDist+len+GC+map+",vars2))
HRM_Block2=HiCblockModel(HRPD,modelBlock2,IBP,regressionMode="NB")
print(HRM_Block2)

# Output from model 2
#Call:
#glm.nb(formula = model, data = dataGLM, init.theta = 3.138426428, 
#    link = log)

#Deviance Residuals: 
#    Min       1Q   Median       3Q      Max  
#-3.7727  -0.8815  -0.2809   0.4282   4.3644  

#Coefficients:
#             Estimate Std. Error z value Pr(>|z|)    
#(Intercept) -4.982498   1.074279  -4.638 3.52e-06 ***
#logDist     -1.236704   0.070946 -17.432  < 2e-16 ***
#len          0.685146   0.027209  25.181  < 2e-16 ***
#GC          -0.560153   0.086482  -6.477 9.35e-11 ***
#map          1.310488   0.053692  24.407  < 2e-16 ***
#I(-BEAF32)   0.176687   0.008283  21.332  < 2e-16 ***
#I(-dCTCF)    0.028798   0.013665   2.107  0.03508 *  
#I(-dTFIIIC)  0.621412   0.077602   8.008 1.17e-15 ***
#I(-GAF)      0.047998   0.007878   6.092 1.11e-09 ***
#I(-SuHw)     0.072596   0.026326   2.758  0.00582 ** 
#---
#Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#(Dispersion parameter for Negative Binomial(3.1384) family taken to be 1)

#    Null deviance: 5702.9  on 3375  degrees of freedom
#Residual deviance: 3564.8  on 3366  degrees of freedom
#AIC: 45911

#Number of Fisher Scoring iterations: 1


#              Theta:  3.1384 
#          Std. Err.:  0.0738 

# 2 x log-likelihood:  -45889.1960 



# Model 3
# Blocking effect (beta) of BEAF-32 is 0.22
HRM_Block3=HiCblockModel(HRPD,NULL,IBP,regressionMode="PoissonLasso")
print(HRM_Block3)

# Output from model 3
#        Variable Coefficient
#logDist  logDist    -1.04649
#len          len     0.56508
#GC            GC    -0.43621
#map          map     0.95623
#BEAF32    BEAF32     0.22077
#dCTCF      dCTCF     0.01680
#dTFIIIC  dTFIIIC     0.73750
#GAF          GAF     0.03878
#SuHw        SuHw     0.12617

HiCblock documentation built on April 19, 2018, 1:04 a.m.