gfpop: Graph-Constrained Functional Pruning Optimal Partitioning...

View source: R/gfpop.R

gfpopR Documentation

Graph-Constrained Functional Pruning Optimal Partitioning (gfpop)

Description

Functional pruning optimal partitioning with a graph structure to take into account constraints on consecutive segment parameters. The user has to specify the graph he wants to use (see the graph function) and a type of cost function. This is the main function of the gfpop package. Its result can be plotted using the S3 gfpop function gfpop::plot()

Usage

gfpop(data, mygraph, type = "mean", weights = NULL, testMode = FALSE)

Arguments

data

vector of data to segment. For simulation studies, Data can be generated using gfpop package function gfpop::dataGenerator()

mygraph

dataframe of class "graph" to constrain the changepoint inference, see gfpop::graph()

type

a string defining the cost model to use: "mean", "variance", "poisson", "exp", "negbin"

weights

vector of weights (positive numbers), same size as data

testMode

boolean. FALSE by default. Used to debug the code

Details

The constrained optimization problem for n data points takes the following general form:

Q_n = min (with constraints) (\sum_{t=1}^n (\gamma(e[t])(y[t], \mu[t]) + \beta(e[t]))

with data points y[t], edges e[t], edge-dependent penalties \beta(e[t]) and cost functions \gamma. The cost function can take three different forms for parameter x and constants (A, B, C):

  • quadratic, with representation Ax^2 + Bx +C with x in R

  • log-linear, with representation Ax - B log(x) +C with x \ge 0

  • log-log, with representation - A log(x) - B log(1-x) +C with 0 \le x \le 1

For each optimization problem, we consider a unique cost representation. However, the User can define robustness values (K and a) specific to each edge, making the cost function edge-dependent. We give the atomic form of each of the five available types (for one data point of value y with weight w)

  • "mean" : A = w, B = -2wy, C = wy^2

  • "variance" : A = wy^2, B = w, C = 0

  • "poisson" : A = w, B = wy, C = 0

  • "exp" : A = wy, B = w, C = 0

  • "negbin" : A = w, B = wy, C = 0

Value

a gfpop object = (changepoints, states, forced, parameters, globalCost)

changepoints

is the vector of changepoints (we give the last element of each segment)

states

is the vector giving the state of each segment

forced

is the vector specifying whether the constraints of the graph are active (= TRUE) or not (= FALSE)

parameters

is the vector of successive parameters of each segment

globalCost

is a number equal to the total loss: the minimal cost for the optimization problem with all penalty values excluded

See Also

  • gfpop::dataGenerator() to generate data for multiple change-point simulations

  • gfpop::graph() to create graphs complying with the gfpop function

  • gfpop::plot() to plot the gfpop object and visualize inferred changepoints and parameters

Examples

n <- 1000 #data length
### EXAMPLE 1 ### updown graph + poisson loss
 myData <- dataGenerator(n, c(0.1, 0.3, 0.5, 0.8, 1), c(1, 2, 1, 3, 1), type = "poisson")
 myGraph <- graph(penalty = 2 * sdDiff(myData)^2 * log(n), type = "updown")
 gfpop(data = myData, mygraph = myGraph, type = "poisson")

### EXAMPLE 2 ### relevant graph with min gap = 2 + poisson loss
 myData <- dataGenerator(n, c(0.1, 0.3, 0.5, 0.8, 1), c(1, 2, 3, 5, 3), type = "poisson")
 myGraph <- graph(type = "relevant", penalty = 2 * log(n), gap = 2)
 gfpop(data =  myData, mygraph = myGraph, type = "poisson")

### EXAMPLE 3 ### std graph with robust loss + variance loss
 myData <- dataGenerator(n, c(0.1, 0.3, 0.5, 0.8, 1), c(1, 5, 1, 5, 1), type = "variance")
 outliers <- 5 * rbinom(n, 1, 0.05) - 5 * rbinom(n, 1, 0.05)
### with robust parameter K
 myGraph <- graph(type = "std", penalty = 2 * log(n), K = 10)
 gfpop(data =  myData + outliers, mygraph = myGraph, type = "variance")
### no K
 myGraph <- graph(type = "std", penalty = 2 * log(n))
 gfpop(data =  myData, mygraph = myGraph, type = "variance")

### EXAMPLE 4 ###  3-segment graph with mean (Gaussian) loss
 myData <- dataGenerator(n, c(0.12, 0.31, 0.53, 0.88, 1), c(1, 2, 0, 1, 2), type = "mean")
 outliers <- 5 * rbinom(n, 1, 0.05) - 5 * rbinom(n, 1, 0.05)
 gfpop(data =  myData + outliers, mygraph = paperGraph(8, penalty = 2 * log(n)), type = "mean")

gfpop documentation built on April 1, 2023, 12:22 a.m.