knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Using a BED6 file of processed ChiP-seq peaks, ChIPAnalyzer can find G-quadruplexes in the vicinity of these ChIP-seq peaks and visualize their position. First we will get a list of reports on G-quadruplexes generated by the pqsfinder function of the pqsfinder package:

library(ChIPAnalyzer)
system.file("extdata", "MAZ_high_score.bed", package = "ChIPAnalyzer")

#get a list of reports for G-quadruplexes around peaks
mazReports <- findQuads(bedPath = "MAZ_high_score.bed", seqWidth = 500, assemblyVersion = "hg19")
(mazReports)

Here bedPath specifies a path to a BED file with at least 6 columns, where each row reperesents a peak. The 6th column:strand must either be a plus or minus, or a period if strand is not specified. In the example file shown, strand is not specified. The seqWidth parameter dictates the the length of the sequence around the centre of each peak that is searched for G-quadruplexes. The assemblyVersion parameter dictates the assembly that sequences should be retreived from, and must be the same assembly that was used to align the ChIP-seq peaks. Currenly supported assemblyVersions are "hg19", "hg38", "mm9", and "mm10". Next, we will analyze these reports to generate a matrix representation of G-quadruplex positions across all peaks. This will take a while for files with a large number of peaks. The example below should take roughly 5 minutes.

#get a binary matrix representing G-quadruplex positions
mazMatrix <- getQuadMatrix(quadReports = mazReports)
(mazMatrix)

Here we generate a binary matrix with a number of columns equal to the seqWidth argument we put into the findQuads() function earlier, and a number of rows equal to the number of peaks in the BED file. If the sequence around a peak has a quadruplex, the values in the matrix will be one in the positions(column numbers) on the sequence occupied by the quadruplex, and 0 otherwise. Only the highest scoring G-quadruplexes for each sequence are examined in this step. Next, we will calculate the percentage of peaks that have a G-quadruplex occupying a certain position in their sequence.

#get a binary matrix representing G-quadruplex positions
mazPercentages <- getQuadCoveragePercentage(quadMatrix = mazMatrix)
(mazPercentages)

In the above step we calculate the percentage of peaks that have a quadruplex at each position on the position of seqWidth length around the centre of each peak. The getQuadCoveragePercentage function will return a vector of percentages of seqWidth length. Now we will plot the percentages:

#get a binary matrix representing G-quadruplex positions
plotQuadPosition(featurePercentages = mazPercentages, title = "MAZ G-quad Positions")

This function, given a vector of percentages for the featurePercentages argument, will plot these percentages on a line graph with a title specified by the title argument.

References

Lawrence M, Huber W, Pag`es H, Aboyoun P, Carlson M, et al. (2013) Software for Computing and Annotating Genomic Ranges. PLoS Comput Biol 9(8): e1003118. doi:10.1371/journal.pcbi.1003118

The Bioconductor Dev Team (2014). BSgenome.Hsapiens.UCSC.hg19: Full genome sequences for Homo sapiens (UCSC version hg19). R package version 1.4.0.

The Bioconductor Dev Team (2014). BSgenome.Mmusculus.UCSC.mm10: Full genome sequences for Mus musculus (UCSC version mm10). R package version 1.4.0.

The Bioconductor Dev Team (2014). BSgenome.Mmusculus.UCSC.mm9: Full genome sequences for Mus musculus (UCSC version mm9). R package version 1.4.0

The Bioconductor Dev Team (2015). BSgenome.Hsapiens.UCSC.hg38: Full genome sequences for Homo sapiens (UCSC version hg38). R package version 1.4.1.

Hon J, Martinek T, Zendulka J, Lexa M. (2017) pqsfinder: an exhaustive and imperfection-tolerant search tool for potential quadruplex-forming sequences in R. Bioinformatics. 33(21), 3373-3379. https://doi.org/10.1093/bioinformatics/btx413

Partridge, E. C., Chhetri, S. B., Mendenhall, E. M., & Myers, R. M. (2019, November 27). GEO Accession viewer. Retrieved November 14, 2020, from https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE104247



RyDe4/ChIPanalyzer documentation built on Sept. 1, 2023, 9:18 a.m.