smooth_peak-method: Spline smoothing of the peak

Description Usage Arguments Details Value Author(s) References Examples

Description

It approximates the read counts associated to every peak with a suitable B-spline function, so that a smoothing representation of the peaks is obtained. The first derivative of the spline is also computed. To obtain a smooth representation, the peak is extended and new initial and final points are identified. See the Vignette of the FunChIP package for a graphical representation of the spline approximation.

Usage

1
2
3
4
5
## S4 method for signature 'GRanges'
smooth_peak(object, n.breaks = 100, subsample = TRUE, 
    subsample.data = 100, order = 4,  
    lambda = (10^(seq(-5,5, by = 0.5))),
    GCV.derivatives = TRUE , plot.GCV = FALSE, rescale = FALSE)

Arguments

object

GRanges object. It must contain the metadata column counts.

n.breaks

integer. Number of breaks, or knots, for the B-spline basis domain definition. Default is 100.

subsample

logical. If TRUE, only a random subset (of size fixed by the parameter subsample.data) is used to identify the optimal value of lambda for the penalization via cross-validation. If subsample=FALSE, all the peaks of the GRanges data will be used. To contain running times, it is suggested to maintain the default value subsample = TRUE.

subsample.data

integer. Number of data used for the cross-validation (if subsample.data is TRUE). Default value is 100. If subsample = FALSE, all data points will be used and subsample.data is ignored.

order

integer. Order of the B-spline basis used for the smoothing. The order is one higher than the degree of the spline. Default is 4 (cubic splines).

lambda

vector (or single value). Contains all the possible values of the smoothing parameter to be considered for the final choice. If a single value is provided, this will be automatically chosen for the smoothing. Default value is 10^{\textrm{\code{seq(-5,5,by=0.5)}}} to analyze a sufficiently wide set of values. See details below.

GCV.derivatives

logical. If TRUE the Generalized Cross Validation index (GCV) on the derivatives is considered as criteria to identify λ, otherwise the GCV is computed on the data. Default is TRUE.

plot.GCV

logical. If TRUE, the plot of the GCV of the data and derivatives is shown as a function of λ. Default value is FALSE.

rescale

logical. If TRUE scaled peaks are also provided. From the spline approximation of the peak a new curve is defined. It is obtained scaling both the abscissa grid and the values of the coverages of the splines. All the scaled peaks have a common grid of width equal to the minimum width of the origninal splines and area equal to 1. Default is FALSE.

Details

It creates a piece-wise polynomial of fixed order s approximating the data (B-spline expansion, Ramsay and Silverman, 2005). Given the point wise defined function f: (x,f(x)), the smooth_peak method returns the evaluation of s on the x grid (s(x)) minimizing, for a fixed λ,

ERR(λ) = || f - s ||^2_{L^2} + λ ||s''||^2_{L_2}

, with s'' being the second derivative of the function s and ||s||_{L^2} the L^2 norm of the function, i.e. the integral on the domain of s of s^2.

The choice of λ is crucial for the definition of the spline, and it can be selected by minimizing the Generalized Cross-Validation index

GCV(λ) = (n SSE)/(n-df(λ))^2

, with SSE the error computed as

SSE = || f - s ||^2_{L^2}

, if GCV.derivatives = FALSE, or

SSE = || grad(f) - s' ||^2_{L^2}

, if GCV.derivatives = TRUE, and df(λ) is the number of the degrees of freedom of the basis expansion automatically computed from s. For further details on the cross-validation procedure and on the computation of the number of degrees of freedom see Ramsay and Silverman, 2005.

If plot.GCV is TRUE, the plot of the GCV index as a function of λ is presented, which can be used to identify the optimal value of the parameter. If the plot is decreasing in λ, one could consider to increase the allowed values of λ to find the minimum of the curve.

Value

the GRanges object with new metadata columns:

If rescale is TRUE two more metadata columns are added:

Author(s)

Alice Parodi, Marco J. Morelli, Laura M. Sangalli, Piercesare Secchi, Simone Vantini

References

Ramsay, J.O., Silverman, B.W., 2005. Functional Data Analysis, 2nd ed. Springer, New York, NY.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
# load the data
data(peaks)

# it computes the spline approximation
# of the pealks given the
# GRange with the metadata counts. 
# It is obtained by the pileup_peak method

# Default paramters are used: GCV is
# computed on the derivatives.

peaks.spline <- smooth_peak(peaks.data, lambda = 10^(-4:6), 
                            subsample.data = 50, GCV.derivatives = TRUE )

peaks.spline.scaled <- smooth_peak(peaks.data, lambda = 10^(-4:6), 
                            subsample.data = 50, GCV.derivatives = TRUE, rescale = TRUE )

FunChIP documentation built on Nov. 8, 2020, 4:50 p.m.