summit_from_vector: Determine summit from numeric vector
In jmw86069/platjam: Platform Jam, biological platform importers.

summit_from_vector

R Documentation

Determine summit from numeric vector

Description

Determine summit from numeric vector

Usage

summit_from_vector(x, spar = 0.5, edge_buffer = 0, return_height = TRUE, ...)

Arguments

`x`	`numeric` vector from which a summit will be determined.
`spar`	`numeric` or `NULL` passed to `smooth.spline()` to adjust the smoothing parameter. The default `spar=0.5` appears to provide smoothing at a reasonable and consistent level for genome coverage data, which tends to have long stretches of horizontal coverage that tend to be overfitted when `spar=NULL`.
`edge_buffer`	`integer` number of values at the leading and trailing edge of `x` to be ignored when determining the summit. This argument is experimental, and is intended to prevent the very beginning or end of a region from being the "summit" when there may be an internal peak that is preferred. Note that when `(edge_buffer*2) > length(x)` the entire region is ignored, in which case the middle position is returned.
`...`	additional arguments are passed to `smooth.spline()`.

Details

This function takes a numeric vector, intended to be data that represents some signal across a range where that signal is above noise; it calls smooth.spline() to generate a smooth curve across the region, then returns the x position with the max smoothed spline signal.

The original intent is to take genome sequence coverage across an enriched region (a "peak") and determine the peak summit. It should work well for each row of a coverage matrix, provided the coverage matrix is wide enough that the highest signal is located inside the range analyzed.

The other alternative is to import bigWig coverage data for a set of regions of interest defined by a GRanges object. A useful function is splicejam::getGRcoverageFromBw() which can load coverage from one or multiple bigWig files, returning a GRanges object with one column per bigWig file loaded. Then iterate each coverage vector to determine the summit.

Value

integer vector with two values:

"summit" with the index position of the highest point on the smoothed spline curve. If x has one uniform numeric value across the entire range, it returns the midpoint defined by round(length(x)/2). If are two maximum values, the first position is returned.
"summit_height" numeric value with the spline height at the summit position.

Other jam utility functions: cardinality(), color_complement(), convert_PD_df_to_SE(), convert_imputed_assays_to_na(), curate_se_colData(), curate_to_df_by_pattern(), design2layout(), get_numeric_transform(), handle_df_args(), merge_proteomics_se(), nmat_summary(), nmatlist_summary(), rmd_tab_iterator(), rowNormScale()