There has been some question about whether the signal observed in this data set is Parp1 specific or general nucleosome binding. This vignette attempts to answer that question using the histone modification marks as indicators of general nucleosome binding in the matched cell types, and compared to the Parp1 associated nucleosomes.
library(ggplot2) library(GenomicFiles) library(BSgenome.Hsapiens.UCSC.hg19) library(fmcorrelationbreastcaparp1) library(fmdatabreastcaparp1) data("histone_marks") # mcf-7 histone modifications data("parp1_ln4_unique") # mcf-7 parp1 reads
genome_tiles <- tileGenome(seqinfo(Hsapiens), tilewidth = 500, cut.last.tile.in.chrom = TRUE)
min_sum <- 25 parp1_cov <- coverage(parp1_ln4_unique, weight = "n_count") genome_tiles <- binned_function(genome_tiles, parp1_cov, "sum", "parp1") mcols(genome_tiles)$parp1[mcols(genome_tiles)$parp1 < min_sum] <- 0 mcols(genome_tiles)$parp1[mcols(genome_tiles)$parp1 >= min_sum] <- 1
mcols(genome_tiles)$histmod <- 0 histone_tiles <- genome_tiles for (i_name in names(histone_marks)){ histone_cov <- coverage(histone_marks[[i_name]], weight = "mcols.signal") histone_tiles <- binned_function(histone_tiles, histone_cov, "sum", i_name) mcols(genome_tiles)$histmod[mcols(histone_tiles)[, i_name] > 0] <- 1 }
Now we will take a bunch of samplings from the tiles, and then count how many have nucleosomes and how many have parp1.
n_sample <- 1000 out_counts <- lapply(seq(1, 1000), function(x){ use_loc <- sample(length(genome_tiles), n_sample) tmp <- colSums(as.matrix(mcols(genome_tiles[use_loc]))) data.frame(counts = tmp, which = names(tmp)) }) out_counts <- do.call(rbind, out_counts)
ggplot(out_counts, aes(x = counts, fill = which)) + geom_histogram(alpha = 0.8, binwidth = 1, position = "identity")
How many tiles have only Parp1 binding (expect few), vs how many have both (expect majority of Parp1), vs how many have nucleosome only?
presence <- mcols(genome_tiles) parp_only <- sum(as.logical(presence$parp1) & !as.logical(presence$histmod)) histmod_only <- sum(!as.logical(presence$parp1) & as.logical(presence$histmod)) both <- sum(as.logical(presence$parp1) & as.logical(presence$histmod))
r parp_only
r histmod_only
r both
Have downloaded the MNase nucleosome signal from ENCODE for GM12878 and K562. Will combine these to generate overall nucleosome signal. See get_bigwig_data.R
for how this was generated.
data(nuc_signal) genome_tiles <- subsetByOverlaps(genome_tiles, nuc_signal, type = "equal") mcols(genome_tiles)[, "gm12878"] <- 0 mcols(genome_tiles)[mcols(nuc_signal)$nuc > 100, "gm12878"] <- 1
Now lets compare how much specificity or overlap there is among occupancy in the tiles. We will look for how many things in each combination using the vennCounts
function from limma
package to count overlaps.
library(limma) presence2 <- as.matrix(mcols(genome_tiles)) nuc_overlap_counts <- vennCounts(presence2) attr(nuc_overlap_counts, "class") <- NULL
knitr::kable(nuc_overlap_counts)
From this table we can see that both the Parp1 and histone modifications are subsets of the nucleosome locations, as we would expect.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.