In rmflight/fmcorrelationbreastcaparp1: Functions for calculating the correlations in PARP1 nucleosome data in breast cancer cell lines

Parp1 Specificity

There has been some question about whether the signal observed in this data set is Parp1 specific or general nucleosome binding. This vignette attempts to answer that question using the histone modification marks as indicators of general nucleosome binding in the matched cell types, and compared to the Parp1 associated nucleosomes.

Proposed Analysis

Tile genome into equally sized tiles
Define cutoff of signal to denote presence of nucleosomes
Using all histone marks as indicators of general nucleosome binding, find tiles with nucleosome binding
Using parp1 reads, find tiles with parp1 associated nucleosomes
Sample 1000 random tiles, count general and parp1 nucleosomes
Repeat 1000 times
Compare distributions of general and parp1 counts

Load Data

library(ggplot2)
library(GenomicFiles)
library(BSgenome.Hsapiens.UCSC.hg19)
library(fmcorrelationbreastcaparp1)
library(fmdatabreastcaparp1)
data("histone_marks") # mcf-7 histone modifications
data("parp1_ln4_unique") # mcf-7 parp1 reads

Tile Genome & Coverage

genome_tiles <- tileGenome(seqinfo(Hsapiens), tilewidth = 500, cut.last.tile.in.chrom = TRUE)

min_sum <- 25
parp1_cov <- coverage(parp1_ln4_unique, weight = "n_count")
genome_tiles <- binned_function(genome_tiles, parp1_cov, "sum", "parp1")
mcols(genome_tiles)$parp1[mcols(genome_tiles)$parp1 < min_sum] <- 0
mcols(genome_tiles)$parp1[mcols(genome_tiles)$parp1 >= min_sum] <- 1

mcols(genome_tiles)$histmod <- 0
histone_tiles <- genome_tiles
for (i_name in names(histone_marks)){
  histone_cov <- coverage(histone_marks[[i_name]], weight = "mcols.signal")
  histone_tiles <- binned_function(histone_tiles, histone_cov, "sum", i_name)
  mcols(genome_tiles)$histmod[mcols(histone_tiles)[, i_name] > 0] <- 1 
}

Sample & Count

Now we will take a bunch of samplings from the tiles, and then count how many have nucleosomes and how many have parp1.

n_sample <- 1000
out_counts <- lapply(seq(1, 1000), function(x){
  use_loc <- sample(length(genome_tiles), n_sample)
  tmp <- colSums(as.matrix(mcols(genome_tiles[use_loc])))
  data.frame(counts = tmp, which = names(tmp))
})
out_counts <- do.call(rbind, out_counts)

Plot!

ggplot(out_counts, aes(x = counts, fill = which)) + geom_histogram(alpha = 0.8, binwidth = 1, position = "identity")

Genome Wide Patterns

How many tiles have only Parp1 binding (expect few), vs how many have both (expect majority of Parp1), vs how many have nucleosome only?

presence <- mcols(genome_tiles)
parp_only <- sum(as.logical(presence$parp1) & !as.logical(presence$histmod))
histmod_only <- sum(!as.logical(presence$parp1) & as.logical(presence$histmod))
both <- sum(as.logical(presence$parp1) & as.logical(presence$histmod))

Parp only: r parp_only
histone modification only: r histmod_only
both: r both

MNase Nucleosome Signal

Have downloaded the MNase nucleosome signal from ENCODE for GM12878 and K562. Will combine these to generate overall nucleosome signal. See get_bigwig_data.R for how this was generated.

data(nuc_signal)
genome_tiles <- subsetByOverlaps(genome_tiles, nuc_signal, type = "equal")

mcols(genome_tiles)[, "gm12878"] <- 0
mcols(genome_tiles)[mcols(nuc_signal)$nuc > 100, "gm12878"] <- 1

Now lets compare how much specificity or overlap there is among occupancy in the tiles. We will look for how many things in each combination using the vennCounts function from limma package to count overlaps.

library(limma)
presence2 <- as.matrix(mcols(genome_tiles))
nuc_overlap_counts <- vennCounts(presence2)
attr(nuc_overlap_counts, "class") <- NULL

knitr::kable(nuc_overlap_counts)

From this table we can see that both the Parp1 and histone modifications are subsets of the nucleosome locations, as we would expect.

rmflight/fmcorrelationbreastcaparp1 documentation built on May 27, 2019, 9:31 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

rmflight/fmcorrelationbreastcaparp1
Functions for calculating the correlations in PARP1 nucleosome data in breast cancer cell lines

In rmflight/fmcorrelationbreastcaparp1: Functions for calculating the correlations in PARP1 nucleosome data in breast cancer cell lines

Parp1 Specificity

Proposed Analysis

Load Data

Tile Genome & Coverage

Sample & Count

Plot!

Genome Wide Patterns

MNase Nucleosome Signal

R Package Documentation

Browse R Packages

We want your feedback!

rmflight/fmcorrelationbreastcaparp1 Functions for calculating the correlations in PARP1 nucleosome data in breast cancer cell lines

In rmflight/fmcorrelationbreastcaparp1: Functions for calculating the correlations in PARP1 nucleosome data in breast cancer cell lines

Parp1 Specificity

Proposed Analysis

Load Data

Tile Genome & Coverage

Sample & Count

Plot!

Genome Wide Patterns

MNase Nucleosome Signal

R Package Documentation

Browse R Packages

We want your feedback!

rmflight/fmcorrelationbreastcaparp1
Functions for calculating the correlations in PARP1 nucleosome data in breast cancer cell lines