dpseg: dpseg : linear segmentation by dynamic programming
In dpseg: Piecewise Linear Segmentation by Dynamic Programming

Description Usage Arguments Details Value Dependencies Author(s) Examples

dpseg splits a curve (x,y data) into linear segments by a straight forward dynamic programming recursion:

S_j = max(S_{i-jumps} + score(i,j) -P)

where score is a measure of the goodnes of the fit of a linear regression (equiv. to lm(y~x)) between data points i<j. The default scoring function is simply the negative variance of residuals of the linear regression (see arguments type and scoref). P is a break-point penality that implicitly regulates the number of segments (higher P: longer segments), and jumps==1 allows for disjoint segments. The arguments minl and maxl specify minimal (i≤ j-minl) and maximal (i≥ j-maxl) segment lengths, which allows to significantly decrease memory usage when expected segment lengths are known.

1
2
3

dpseg(x, y, maxl, jumps = FALSE, P = 0, minl = 3, S0 = 1,
  type = "var", scoref, verb = 1, move, store.values = TRUE,
  store.matrix = FALSE, add.lm = FALSE, recursion, backtrace, ...)

`x`	x-values, not used if `y` is a scoring function matrix
`y`	y-values, or a pre-calculated scoring function matrix SCR_{i,j} (eg. from a previous run of `dpseg`). See section "Value" below for details on the structure SCR_{i,j}.
`maxl`	maximal segment length, i≥ j-maxl
`jumps`	allow for jumps between segments, if `TRUE` segment ends are 1 index left of the segment starts
`P`	break-point penalty, increase to get longer segments with lower scores (eg. higher residual variance)
`minl`	minimal segment length, i≤ j-minl
`S0`	initialization of S_0, choose high enough to avoid length 1 cutoffs at start
`type`	type of scoring function: available are "var" for "variance of residuals", "cor" for Pearson correlation, or "r2" for r-squared; see the package `vignette("dpseg")` for details.
`scoref`	alternative scoring function
`verb`	print progress messages
`move`	logical indicating whether move is required in backtracing, required for the alternative recursion S_i + score(i+1,j)
`store.values`	store scoring values (linear regression results)
`store.matrix`	store the fitscore matrix
`add.lm`	add a linear fit using R base `lm` for final segments; may save memory/speed if `store.values==FALSE`
`recursion`	internal recursion function to be used for segmentation; used for debugging, benchmarking and development, and required for putative novel scoring functions `scoref`
`backtrace`	internal function to be used for back-tracing; used for debugging, benchmarking and development, and may be required to test novel scoring functions `scoref` and/or `recursion`
`...`	further arguments to `recursion`

See the vignette("dpseg") for the theory and details on the choice of scoring functions and selection of the penalty parameter P.

Returns a list object of class dpseg (with print.dpseg plot.dpseg and predict.dpseg methods). The main result of the algorithm is a table (data.frame) of predicted segments in list object segments. The original data, run parameters and (optionally) additional data calculated and used by the algorithm are also returned.

segments:: main result table: a data.frame that lists the start and end x-values of the segments, the start and end indices (i,j) in the data vectors, the linear regression coefficients and goodness-of-fit measures for the segments (intercept, slope, r-squared, variance of residuals). If dpseg was called with a pre-calculated scoring matrix, the table only contains start and end indices i,j. If option add.lm=TRUE or the result object was sent through function addLm the table additionally contains results from R's lm, indicated by an ".lm" suffix.
S:: results of the recursion, ie. S_j in above equation.
imax:: vector j=1,…,n, storing the i_{max} that yielded S_j, ie., the sole input for the backtracing function.
values:: linear regression coefficients and measures for the segment ending at j and starting at i_{max}(j). Only present if store.valus=TRUE.
SCR:: scoring function matrix SCR_{i,j} = score(i,j) where positions j are the columns and i the rows; a banded matrix with non-NA values between i≤ j-minl and i≥ j-maxl. Note, that this matrix can be re-used in subsequent calls as dpseg(y=previous$SCR) which runs much faster and allows to efficiently scan for alternative parameters. Only present if store.matrix=TRUE.
fits:: result objects from lm. Only present if add.lm=TRUE.
traceback:: result of the call to the backtracing function: ends of the segments.
xy:: original x/y data (xy.coords).
removed:: index of NA/Inf values that were removed before running the alorithm.
parameters:: used parameters P, jumps, maxl and minl.

The package strictly depends only on RcppEigen. All other dependencies are usually present in a basic installation (stats, graphics, grDevices).

Rainer Machne machne@hhu.de, Peter F. Stadler studla@bioinf.uni-leipzig.de

## calculate linear segments in semi-log bacterial growth data
## NOTE: library loads bacterial growth curve data as data.frame oddata
segs <- dpseg(x=oddata$Time, y=log(oddata$A3), minl=5, P=0.0001, verb=1)

## inspect resulting segments
print(segs)

## plot results (also see the movie method)
plot(segs, delog=TRUE, log="y")

## predict method
plot(predict(segs), type="l")