This vignette documents the mechanism provided in postCP 2.0. The document will first explain the theoretical basis and it will move on to explain the usage of the package.

change-point problem

The postCP Package is based on the change point model $$ \boldsymbol y[I_k] \sim \boldsymbol X[I_k] \times \boldsymbol \beta^k $$ where $\textbf{y} \in \mathbb{R} ^{n\times d}$ is the response variable and $\textbf{X} \in \mathbb{R}^{n\times p}$ is the covariate matrix. The response variable is generated with $K>1$ segments according to any Generalized Linear Model (ex: Gaussian or binomial regressions).

$I_k = [cp_{k−1} + 1, cp_k ]$ is the index of the individuals of the k th segment, and with the convention that $cp_0 = 0$ and that $cp_K = n$

Posterior change-point distribution

For a given parameter $\boldsymbol \theta$, our objective is to compute the posterior change-point distribution: $$ \mathbb{P}(\textbf{cp} | \boldsymbol y,\boldsymbol X ; \boldsymbol \theta) $$ or, equivalently, the posterior segmentation distribution: $$ \mathbb{P}(\boldsymbol S | \boldsymbol y,\boldsymbol X ; \boldsymbol \theta) $$

The posterior distributions are calculated through a constrained HMM (Hidden Markov Model) coded in C++ with the aim of achieving efficiency in computations.

Let's look at an example,

require(postCP)
#prepare data
sigma=1.3
#Change point estimates
bp=c(7,10)
#Obtain data from longley dataset
data = longley
#Apply postcp function
res = postcp(Employed ~ GNP + Armed.Forces,family=gaussian(),data=data,bp=c(7,10),sigma)

#Plot the results
plot.postcp(res,main="Posterior Change Point Probability Distribution")

#Apply postcp function with maxFB=TRUE to obtain marginal distribution
res = postcp(Employed ~ GNP + Armed.Forces,family=gaussian(),data=data,bp=c(7,10),sigma,maxFB=TRUE)

#Plot the results
plot.postcp(res,main="Posterior Change Point Probability Distribution")

The following example shows a more noticeable change.

require(postCP)
data = longley
plot(data$Armed.Forces)
res = postcp(Armed.Forces ~ 1,family=gaussian(),data=data,bp=c(10),sigma=1)
plot.postcp(res,main="Posterior Change Point Probability Distribution")

Some simulated examples are presented below for verification.

## change in the mean
data = data.frame(signal=rnorm(150) + rep(c(1, 4), each=75), position=1:150)
plot(data$signal)
res = postcp(signal ~ 1 ,family=gaussian(),data=data,bp=c(40),sigma=1)
plot.postcp(res,main="Posterior Change Point Probability Distribution")

## change in the slope
position <- 1:150
data = data.frame(signal = rep(c(1, 2), c(50, 100))*position + rnorm(150), position=position)
plot(data$signal)
res = postcp(signal ~ position ,family=gaussian(),data=data,bp=c(30),sigma=1)
plot.postcp(res,main="Posterior Change Point Probability Distribution")

Acknowledgements

The package was revived as a Google Summer of Code project in 2016 by Malith Jayaweera. Please see the link below for project details.
postCP Package Improvement



malithj/postCP documentation built on May 21, 2019, 11:22 a.m.