Description Usage Arguments Details Value Author(s) References See Also Examples
Detect narrow peaks from enrichment-score profiles (ChIP-seq peak regions).
1 2 | narrowpeaks(inputReg, scoresInfo, lmin = 0, nbf = 50, rpenalty= 0,
nderiv= 0, npcomp = 5, pv = 80, pmaxscor = 0.0, ms = 0)
|
inputReg |
Output of the function |
scoresInfo |
Output |
lmin |
Minimum length of an enriched region from the WIG file to be processed. Integer value. |
nbf |
Number of order-4 B-spline basis functions that will represent the shape of each candidate transcription factor binding site. Integer value. |
rpenalty |
Smoothing parameter for derivative penalization. Positive numeric value. |
nderiv |
Order of derivative penalization, if |
npcomp |
Number of functional principal components. Integer value
greater than or equal to |
pv |
Minimum percentage of variation to take into account during the analysis. Numeric value in the range 0-100 (see the vignette and Mateos, Madrigal, et al. (2015)). |
pmaxscor |
Cutoff for trimming of scoring function. Numeric value in the range 0-100. |
ms |
Peaks closer to each other than |
This function produces shortened sites from a list of candidate
transcription factor binding sites of arbitrary extension and shape. First,
the enrichment signal from each candidate site is represented by a smoothed
function constructed using a linear combination of order-4 B-spline basis
functions. The data values are fitted using either least squares (if
rpenalty = 0), or penalized residuals sum of squares (spline smoothing
if rpenalty > 0).
Then, a functional principal component analysis
for npcomp
eigenfunctions is performed (Ramsay and Silverman, 2005),
giving as a result a set of probe scores (principal component scores) which
sum of squares is reported in elementMetadata(broadPeaks)[,"fpcaScore"]
.
The higher the value of fpcaScore
, the higher the variance that
candidate peak accounts for within the original data. Details on the usage
of semi-metrics in functional PCA is described in Ferraty and Vieu, 2006.
After that, we impose the condition that total scoring function for each
reported narrow peak must be at least pmaxscor
per cent of the maximum
value. Max value is calculated from a set of scoring functions using only the
eigenfunctions required to achieve pv
percent of variance. A new set
of scores is computed using trimmed versions of the eigenfunctions (see
Vignette), and the root square is stored in
elementMetadata(narrowPeaks)[,"trimmedScore"]
.
A list containing the following elements:
fdaprofiles |
A functional data object encapsulating the enrichment
profiles (see fda package. To plot the data use
|
broadPeaks |
Description of the peaks prior to trimming. A
|
narrowPeaks |
Description of the peaks after trimming. A |
reqcomp |
Number of functional principal components used. Integer value. |
pvar |
Total proportion of variance accounted for by the |
Pedro Madrigal, dnaseiseq@gmail.com
Mateos JL, Madrigal P, et al. (2015) Combinatorial activities of SHORT VEGETATIVE PHASE and FLOWERING LOCUS C define distinct modes of flowering regulation in Arabidopsis. Genome Biology 16: 31.
Bailey T, Krajewski P, Ladunga I, Lefebvre C, Li Q, Liu T, Madrigal P, Taslim C, Zhang J (2013) Practical Guidelines for the Comprehensive Analysis of ChIP-seq data. PLOS Comput Biol. 9 (11): e1003326.
Muino JM, Kaufmann K, van Ham RC, Angenent GC, Krajewski P (2011) ChIP-seq analysis in R
(CSAR): An R package for the statistical detection of protein-bound genomic regions.
Plant Methods 7:11.
Ramsay, J.O. and Silverman, B.W. (2005) Functional Data Analysis. New York:
Springer.
Ferraty, F. and Vieu, P. (2006) Nonparametric Functional Data Analysis.
New York: Springer.
wig2CSARScore
, NarrowPeaks-package
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | owd <- setwd(tempdir())
##For this example we will use a subset of the AP1 ChIP-seq data (Kaufmann et
##al., 2010)
##The data is obtained after analysis using the CSAR package available in
##Bioconductor
data("NarrowPeaks-dataset")
writeLines(wigfile_test, con="wigfile.wig")
##Write binary files with the WIG signal values for each chromosome
##independently and obtain regions of read-enrichment with score values greater
##than 't', allowing a gap of 'g'. Data correspond to enriched regions found up
##to 105Kb in the Arabidopsis thaliana genome
wigScores <- wig2CSARScore(wigfilename="wigfile.wig", nbchr = 1,
chrle=c(30427671))
gc(reset=TRUE)
library(CSAR)
candidates <- sigWin(experiment=wigScores$infoscores, t=1.0, g=30)
##Narrow down ChIPSeq enriched regions by functional PCA
shortpeaks <- narrowpeaks(inputReg=candidates,
scoresInfo=wigScores$infoscores, lmin=0, nbf=150, rpenalty=0,
nderiv=0, npcomp=2, pv=80, pmaxscor=3.0, ms=0)
###Export GRanges object with the peaks to annotation tracks in various
##formats. E.g.:
library(GenomicRanges)
names(elementMetadata(shortpeaks$broadPeaks))[3] <- "score"
names(elementMetadata(shortpeaks$narrowPeaks))[2] <- "score"
library(rtracklayer)
export.bedGraph(object=candidates, con="CSAR.bed")
export.bedGraph(object=shortpeaks$broadPeaks, con="broadPeaks.bed")
export.bedGraph(object=shortpeaks$narrowPeaks, con="narrowpeaks.bed")
setwd(owd)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.