summit_from_vector: Determine summit from numeric vector

summit_from_vectorR Documentation

Determine summit from numeric vector

Description

Determine summit from numeric vector

Usage

summit_from_vector(x, spar = 0.5, edge_buffer = 0, return_height = TRUE, ...)

Arguments

x

numeric vector from which a summit will be determined.

spar

numeric or NULL passed to smooth.spline() to adjust the smoothing parameter. The default spar=0.5 appears to provide smoothing at a reasonable and consistent level for genome coverage data, which tends to have long stretches of horizontal coverage that tend to be overfitted when spar=NULL.

edge_buffer

integer number of values at the leading and trailing edge of x to be ignored when determining the summit. This argument is experimental, and is intended to prevent the very beginning or end of a region from being the "summit" when there may be an internal peak that is preferred. Note that when (edge_buffer*2) > length(x) the entire region is ignored, in which case the middle position is returned.

...

additional arguments are passed to smooth.spline().

Details

This function takes a numeric vector, intended to be data that represents some signal across a range where that signal is above noise; it calls smooth.spline() to generate a smooth curve across the region, then returns the x position with the max smoothed spline signal.

The original intent is to take genome sequence coverage across an enriched region (a "peak") and determine the peak summit. It should work well for each row of a coverage matrix, provided the coverage matrix is wide enough that the highest signal is located inside the range analyzed.

The other alternative is to import bigWig coverage data for a set of regions of interest defined by a GRanges object. A useful function is splicejam::getGRcoverageFromBw() which can load coverage from one or multiple bigWig files, returning a GRanges object with one column per bigWig file loaded. Then iterate each coverage vector to determine the summit.

Value

integer vector with two values:

  • "summit" with the index position of the highest point on the smoothed spline curve. If x has one uniform numeric value across the entire range, it returns the midpoint defined by round(length(x)/2). If are two maximum values, the first position is returned.

  • "summit_height" numeric value with the spline height at the summit position.

See Also

Other jam utility functions: cardinality(), color_complement(), convert_PD_df_to_SE(), convert_imputed_assays_to_na(), curate_se_colData(), curate_to_df_by_pattern(), design2layout(), get_numeric_transform(), handle_df_args(), merge_proteomics_se(), nmat_summary(), nmatlist_summary(), rmd_tab_iterator(), rowNormScale()


jmw86069/platjam documentation built on Sept. 26, 2024, 3:31 p.m.