narrowpeaks: Detect Narrow Peaks from Enrichment-Score Profiles

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Detect narrow peaks from enrichment-score profiles (ChIP-seq peak regions).

Usage

1
2
narrowpeaks(inputReg, scoresInfo, lmin = 0, nbf = 50, rpenalty= 0,  
nderiv= 0, npcomp = 5, pv = 80, pmaxscor = 0.0, ms = 0)

Arguments

inputReg

Output of the function sigWin in package CSAR.

scoresInfo

Output infoscores in the function wig2CSARScore, or the function ChIPseqScore after data analysis with package CSAR.

lmin

Minimum length of an enriched region from the WIG file to be processed. Integer value.

nbf

Number of order-4 B-spline basis functions that will represent the shape of each candidate transcription factor binding site. Integer value.

rpenalty

Smoothing parameter for derivative penalization. Positive numeric value.

nderiv

Order of derivative penalization, if rpenalty>0. Integer value.

npcomp

Number of functional principal components. Integer value greater than or equal to nbf.

pv

Minimum percentage of variation to take into account during the analysis. Numeric value in the range 0-100 (see the vignette and Mateos, Madrigal, et al. (2015)).

pmaxscor

Cutoff for trimming of scoring function. Numeric value in the range 0-100.

ms

Peaks closer to each other than ms nucleotides will be merged in the final list. Integer value.

Details

This function produces shortened sites from a list of candidate transcription factor binding sites of arbitrary extension and shape. First, the enrichment signal from each candidate site is represented by a smoothed function constructed using a linear combination of order-4 B-spline basis functions. The data values are fitted using either least squares (if rpenalty = 0), or penalized residuals sum of squares (spline smoothing if rpenalty > 0).
Then, a functional principal component analysis for npcomp eigenfunctions is performed (Ramsay and Silverman, 2005), giving as a result a set of probe scores (principal component scores) which sum of squares is reported in elementMetadata(broadPeaks)[,"fpcaScore"]. The higher the value of fpcaScore, the higher the variance that candidate peak accounts for within the original data. Details on the usage of semi-metrics in functional PCA is described in Ferraty and Vieu, 2006.
After that, we impose the condition that total scoring function for each reported narrow peak must be at least pmaxscor per cent of the maximum value. Max value is calculated from a set of scoring functions using only the eigenfunctions required to achieve pv percent of variance. A new set of scores is computed using trimmed versions of the eigenfunctions (see Vignette), and the root square is stored in elementMetadata(narrowPeaks)[,"trimmedScore"].

Value

A list containing the following elements:

fdaprofiles

A functional data object encapsulating the enrichment profiles (see fda package. To plot the data use plot.fd(fdaprofiles)).

broadPeaks

Description of the peaks prior to trimming. A GRanges object (see GenomicRanges package) with the information: seqnames (chromosome), ranges (start and end of the candidate site), strand (not used), max (maximum signal value for candidate site), average (mean signal value for candidate site), fpcaScore (sum of squares of the first reqcomp principal component scores for candidate site).

narrowPeaks

Description of the peaks after trimming. A GRanges object (see GenomicRanges package) with the information: seqnames (chromosome), ranges (start and end after trimming), strand (not used), broadPeak.subpeak, trimmedScore (see details), narrowedDownTo (length reduction relative to the candidate), merged (logical value).

reqcomp

Number of functional principal components used. Integer value.

pvar

Total proportion of variance accounted for by the reqcomp components used. Numeric value in the range 0-100 (always greater than or equal to argument pv).

Author(s)

Pedro Madrigal, dnaseiseq@gmail.com

References

Mateos JL, Madrigal P, et al. (2015) Combinatorial activities of SHORT VEGETATIVE PHASE and FLOWERING LOCUS C define distinct modes of flowering regulation in Arabidopsis. Genome Biology 16: 31.
Bailey T, Krajewski P, Ladunga I, Lefebvre C, Li Q, Liu T, Madrigal P, Taslim C, Zhang J (2013) Practical Guidelines for the Comprehensive Analysis of ChIP-seq data. PLOS Comput Biol. 9 (11): e1003326.
Muino JM, Kaufmann K, van Ham RC, Angenent GC, Krajewski P (2011) ChIP-seq analysis in R (CSAR): An R package for the statistical detection of protein-bound genomic regions. Plant Methods 7:11.
Ramsay, J.O. and Silverman, B.W. (2005) Functional Data Analysis. New York: Springer.
Ferraty, F. and Vieu, P. (2006) Nonparametric Functional Data Analysis. New York: Springer.

See Also

wig2CSARScore, NarrowPeaks-package

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
owd <- setwd(tempdir())

##For this example we will use a subset of the AP1 ChIP-seq data (Kaufmann et
##al., 2010)
##The data is obtained after analysis using the CSAR package available in 
##Bioconductor 
data("NarrowPeaks-dataset")
writeLines(wigfile_test, con="wigfile.wig")

##Write binary files with the WIG signal values for each chromosome 
##independently and obtain regions of read-enrichment with score values greater
##than 't', allowing a gap of 'g'. Data correspond to enriched regions found up
##to 105Kb in the Arabidopsis thaliana genome
wigScores <- wig2CSARScore(wigfilename="wigfile.wig", nbchr = 1, 
chrle=c(30427671))
gc(reset=TRUE) 
library(CSAR)
candidates <- sigWin(experiment=wigScores$infoscores, t=1.0, g=30)

##Narrow down ChIPSeq enriched regions by functional PCA
shortpeaks <- narrowpeaks(inputReg=candidates, 
scoresInfo=wigScores$infoscores, lmin=0, nbf=150, rpenalty=0, 
nderiv=0, npcomp=2, pv=80, pmaxscor=3.0, ms=0)

###Export GRanges object with the peaks to annotation tracks in various 
##formats. E.g.:
library(GenomicRanges)
names(elementMetadata(shortpeaks$broadPeaks))[3] <- "score"
names(elementMetadata(shortpeaks$narrowPeaks))[2] <- "score"
library(rtracklayer)
export.bedGraph(object=candidates, con="CSAR.bed")
export.bedGraph(object=shortpeaks$broadPeaks, con="broadPeaks.bed")
export.bedGraph(object=shortpeaks$narrowPeaks, con="narrowpeaks.bed")

setwd(owd)

NarrowPeaks documentation built on April 28, 2020, 6:51 p.m.