Parp1 Specificity

There has been some question about whether the signal observed in this data set is Parp1 specific or general nucleosome binding. This vignette attempts to answer that question using the histone modification marks as indicators of general nucleosome binding in the matched cell types, and compared to the Parp1 associated nucleosomes.

Proposed Analysis

Load Data

library(ggplot2)
library(GenomicFiles)
library(BSgenome.Hsapiens.UCSC.hg19)
library(fmcorrelationbreastcaparp1)
library(fmdatabreastcaparp1)
data("histone_marks") # mcf-7 histone modifications
data("parp1_ln4_unique") # mcf-7 parp1 reads

Tile Genome & Coverage

genome_tiles <- tileGenome(seqinfo(Hsapiens), tilewidth = 500, cut.last.tile.in.chrom = TRUE)
min_sum <- 25
parp1_cov <- coverage(parp1_ln4_unique, weight = "n_count")
genome_tiles <- binned_function(genome_tiles, parp1_cov, "sum", "parp1")
mcols(genome_tiles)$parp1[mcols(genome_tiles)$parp1 < min_sum] <- 0
mcols(genome_tiles)$parp1[mcols(genome_tiles)$parp1 >= min_sum] <- 1
mcols(genome_tiles)$histmod <- 0
histone_tiles <- genome_tiles
for (i_name in names(histone_marks)){
  histone_cov <- coverage(histone_marks[[i_name]], weight = "mcols.signal")
  histone_tiles <- binned_function(histone_tiles, histone_cov, "sum", i_name)
  mcols(genome_tiles)$histmod[mcols(histone_tiles)[, i_name] > 0] <- 1 
}

Sample & Count

Now we will take a bunch of samplings from the tiles, and then count how many have nucleosomes and how many have parp1.

n_sample <- 1000
out_counts <- lapply(seq(1, 1000), function(x){
  use_loc <- sample(length(genome_tiles), n_sample)
  tmp <- colSums(as.matrix(mcols(genome_tiles[use_loc])))
  data.frame(counts = tmp, which = names(tmp))
})
out_counts <- do.call(rbind, out_counts)

Plot!

ggplot(out_counts, aes(x = counts, fill = which)) + geom_histogram(alpha = 0.8, binwidth = 1, position = "identity")

Genome Wide Patterns

How many tiles have only Parp1 binding (expect few), vs how many have both (expect majority of Parp1), vs how many have nucleosome only?

presence <- mcols(genome_tiles)
parp_only <- sum(as.logical(presence$parp1) & !as.logical(presence$histmod))
histmod_only <- sum(!as.logical(presence$parp1) & as.logical(presence$histmod))
both <- sum(as.logical(presence$parp1) & as.logical(presence$histmod))

MNase Nucleosome Signal

Have downloaded the MNase nucleosome signal from ENCODE for GM12878 and K562. Will combine these to generate overall nucleosome signal. See get_bigwig_data.R for how this was generated.

data(nuc_signal)
genome_tiles <- subsetByOverlaps(genome_tiles, nuc_signal, type = "equal")

mcols(genome_tiles)[, "gm12878"] <- 0
mcols(genome_tiles)[mcols(nuc_signal)$nuc > 100, "gm12878"] <- 1

Now lets compare how much specificity or overlap there is among occupancy in the tiles. We will look for how many things in each combination using the vennCounts function from limma package to count overlaps.

library(limma)
presence2 <- as.matrix(mcols(genome_tiles))
nuc_overlap_counts <- vennCounts(presence2)
attr(nuc_overlap_counts, "class") <- NULL
knitr::kable(nuc_overlap_counts)

From this table we can see that both the Parp1 and histone modifications are subsets of the nucleosome locations, as we would expect.



rmflight/fmcorrelationbreastcaparp1 documentation built on May 27, 2019, 9:31 a.m.