dpseg: dpseg : linear segmentation by dynamic programming

Description Usage Arguments Details Value Dependencies Author(s) Examples

View source: R/dpseg.R

Description

dpseg splits a curve (x,y data) into linear segments by a straight forward dynamic programming recursion:

S_j = max(S_{i-jumps} + score(i,j) -P)

where score is a measure of the goodnes of the fit of a linear regression (equiv. to lm(y~x)) between data points i<j. The default scoring function is simply the negative variance of residuals of the linear regression (see arguments type and scoref). P is a break-point penality that implicitly regulates the number of segments (higher P: longer segments), and jumps==1 allows for disjoint segments. The arguments minl and maxl specify minimal (i≤ j-minl) and maximal (i≥ j-maxl) segment lengths, which allows to significantly decrease memory usage when expected segment lengths are known.

Usage

1
2
3
dpseg(x, y, maxl, jumps = FALSE, P = 0, minl = 3, S0 = 1,
  type = "var", scoref, verb = 1, move, store.values = TRUE,
  store.matrix = FALSE, add.lm = FALSE, recursion, backtrace, ...)

Arguments

x

x-values, not used if y is a scoring function matrix

y

y-values, or a pre-calculated scoring function matrix SCR_{i,j} (eg. from a previous run of dpseg). See section "Value" below for details on the structure SCR_{i,j}.

maxl

maximal segment length, i≥ j-maxl

jumps

allow for jumps between segments, if TRUE segment ends are 1 index left of the segment starts

P

break-point penalty, increase to get longer segments with lower scores (eg. higher residual variance)

minl

minimal segment length, i≤ j-minl

S0

initialization of S_0, choose high enough to avoid length 1 cutoffs at start

type

type of scoring function: available are "var" for "variance of residuals", "cor" for Pearson correlation, or "r2" for r-squared; see the package vignette("dpseg") for details.

scoref

alternative scoring function

verb

print progress messages

move

logical indicating whether move is required in backtracing, required for the alternative recursion S_i + score(i+1,j)

store.values

store scoring values (linear regression results)

store.matrix

store the fitscore matrix

add.lm

add a linear fit using R base lm for final segments; may save memory/speed if store.values==FALSE

recursion

internal recursion function to be used for segmentation; used for debugging, benchmarking and development, and required for putative novel scoring functions scoref

backtrace

internal function to be used for back-tracing; used for debugging, benchmarking and development, and may be required to test novel scoring functions scoref and/or recursion

...

further arguments to recursion

Details

See the vignette("dpseg") for the theory and details on the choice of scoring functions and selection of the penalty parameter P.

Value

Returns a list object of class dpseg (with print.dpseg plot.dpseg and predict.dpseg methods). The main result of the algorithm is a table (data.frame) of predicted segments in list object segments. The original data, run parameters and (optionally) additional data calculated and used by the algorithm are also returned.

segments:

main result table: a data.frame that lists the start and end x-values of the segments, the start and end indices (i,j) in the data vectors, the linear regression coefficients and goodness-of-fit measures for the segments (intercept, slope, r-squared, variance of residuals). If dpseg was called with a pre-calculated scoring matrix, the table only contains start and end indices i,j. If option add.lm=TRUE or the result object was sent through function addLm the table additionally contains results from R's lm, indicated by an ".lm" suffix.

S:

results of the recursion, ie. S_j in above equation.

imax:

vector j=1,…,n, storing the i_{max} that yielded S_j, ie., the sole input for the backtracing function.

values:

linear regression coefficients and measures for the segment ending at j and starting at i_{max}(j). Only present if store.valus=TRUE.

SCR:

scoring function matrix SCR_{i,j} = score(i,j) where positions j are the columns and i the rows; a banded matrix with non-NA values between i≤ j-minl and i≥ j-maxl. Note, that this matrix can be re-used in subsequent calls as dpseg(y=previous$SCR) which runs much faster and allows to efficiently scan for alternative parameters. Only present if store.matrix=TRUE.

fits:

result objects from lm. Only present if add.lm=TRUE.

traceback:

result of the call to the backtracing function: ends of the segments.

xy:

original x/y data (xy.coords).

removed:

index of NA/Inf values that were removed before running the alorithm.

parameters:

used parameters P, jumps, maxl and minl.

Dependencies

The package strictly depends only on RcppEigen. All other dependencies are usually present in a basic installation (stats, graphics, grDevices).

Author(s)

Rainer Machne machne@hhu.de, Peter F. Stadler studla@bioinf.uni-leipzig.de

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## calculate linear segments in semi-log bacterial growth data
## NOTE: library loads bacterial growth curve data as data.frame oddata
segs <- dpseg(x=oddata$Time, y=log(oddata$A3), minl=5, P=0.0001, verb=1)

## inspect resulting segments
print(segs)

## plot results (also see the movie method)
plot(segs, delog=TRUE, log="y")

## predict method
plot(predict(segs), type="l")

dpseg documentation built on Aug. 17, 2020, 5:06 p.m.