Description Usage Arguments Details Value Author(s) References See Also Examples
bcp()
implements the Bayesian change point analysis methods given in Wang and Emerson (2015), of which the Barry and Hartigan (1993) product
partition model for the normal errors change point problem is a specific case. 1. Multivariate (or univariate) Bayesian change point analysis: We assume there exists an unknown partition of a data series y
into blocks such that the mean is constant within each block. In the multivariate
case, a common change point structure
is assumed; means are constant within each block of each sequence, but may differ across sequences
within a given block. Conditional on the partition, the model assumes that observations are independent, identically distributed normal, with constant means within blocks and
constant variance throughout each sequence.
2. Linear regression Bayesian change point analysis: As with the previous model, we assume the observations (x,y), where x may be multivariate, are partitioned into blocks, and that linear models are appropriate within each block.
If an adjacency structure is provided, the data are assumed to reside on nodes of a graph with the given adjacency structure; additional parameters are used in this graph change point model. If no adjacency structure is provided, the data are assumed to be sequential and the blocks are forced to be contiguous.
1 2 3 |
y |
a vector or matrix of numerical data (with no missing values). For the multivariate change point problems, each column corresponds to a series. |
x |
(optional) a matrix of numerical data (with no missing values) representing the predicting variables for a linear regression. |
id |
(optional) a vector of integers specifying the location ID for each observation in |
adj |
(optional) an adjacency list. Indexing the observations from 1 to n, the i-th element of the list is a vector of indices (offset by 1) for nodes that share an edge with node i. |
w0 |
(optional) a single numeric value in the multivariate case or a vector of values in the regression case; in both, the value(s), between 0 and 1, is/are the parameter(s) in the uniform prior(s) on the signal-to-noise ratio(s). If no value is specified, the default value of 0.2 is used, as recommended by Barry and Hartigan (1993). |
p0 |
(optional) a value between 0 and 1. For sequential data, it is the parameter of the prior on change point probabilities, U(0, |
d |
(optional) a positive number only used for linear regression change point models. Lower |
burnin |
the number of burnin iterations. |
mcmc |
the number of iterations after burnin. |
return.mcmc |
logical. If set to |
boundaryType |
(optional) only applicable for graph change point analysis. Values can be “node” (default) if we count nodes in the boundary length calculation, or “edge” if we count edges in the boundary length calculation. See Wang and Emerson (2015) for details. |
p1 |
(optional) only applicable for graph change point analysis. The proportion of Active Pixel Passes run that are the actual Active Pixel Passes specified in Barry and Hartigan (1994). |
freqAPP |
(optional) only applicable for graph change point analysis. A positive integer for the number of Active Pixel Passes run in each step of the MCMC algorithm. |
nreg |
(optional) only applicable for regression; related to parameter
|
The primary result is an estimate of the posterior mean (or its distribution if
return.mcmc
is TRUE
) at every location. Unlike a frequentist or
algorithmic approach to the problem, these estimates will not be constant within
regions, and no single partition is identified as best. Estimates of the
probability of a change point at any given location are provided, however.
The user may set .Random.seed
to control the MCMC iterations.
The functions summary.bcp
, print.bcp
, and plot.bcp
are
used to obtain summaries of the results; legacyplot
is included
from package versions prior to 3.0.0 and will only work for univariate change
point analyses.
Returns a list containing the following components:
a copy of the data.
TRUE
or FALSE
as specified by the user; see the arguments, above.
if return.mcmc=TRUE
, mcmc.means
contains the means for each iteration conditional on the state of the partition.
if return.mcmc=TRUE
, mcmc.rhos
contains the partitions after each iteration. A value of 1 indicates the end of a block.
a vector of the number of blocks after each iteration.
a vector or matrix of the estimated posterior means. In the regression case, the matrix includes posterior means for the response variable.
a vector or matrix of the estimated posterior variances. In the regression case, the estimated posterior variances of the response are provided.
a vector of the estimated posterior probabilities of changes at each location.
the number of burnin iterations.
the number of iterations after burnin.
see the arguments, above.
see the arguments, above.
Xiaofei Wang, Chandra Erdman, and John W. Emerson
J. Bai and P. Perron (2003), Computation and Analysis of Multiple Structural Change Models, Journal of Applied Econometrics, 18, 1-22. http://qed.econ.queensu.ca/jae/2003-v18.1/bai-perron/.
Daniel Barry and J. A. Hartigan (1993), A Bayesian Analysis for Change Point Problems, Journal of The American Statistical Association, 88, 309-19.
Daniel Barry and J. A. Hartigan (1994), A Product Partition Model for Image Restoration, New Directions in Statistical Data Analysis and Robustness, (Monte Verita : Proceedings of the Cento Stefano Franscini Ascona), Birkhauser, 9-23.
Chandra Erdman and John W. Emerson (2008), A Fast Bayesian Change Point Analysis for the Segmentation of Microarray Data, Bioinformatics, 24(19), 2143-2148. https://www.ncbi.nlm.nih.gov/pubmed/18667443.
Chandra Erdman and John W. Emerson (2007), bcp: An R Package for Performing a Bayesian Analysis of Change Point Problems. Journal of Statistical Software, 23(3), 1-13. http://www.jstatsoft.org/v23/i03/.
A. B. Olshen, E. S. Venkatraman, R. Lucito, M. Wigler (2004), Circular binary segmentation for the analysis of array-based DNA copy number data, Biostatistics, 5, 557-572. http://www.bioconductor.org/packages/release/bioc/html/DNAcopy.html.
Snijders et al. (2001), Assembly of microarrays for genome-wide measurement of DNA copy number, Nature Genetics, 29, 263-264.
Xiaofei Wang and John W. Emerson (2015). Bayesian Change Point Analysis of Linear Models on General Graphs, Working Paper.
Achim Zeileis, Friedrich Leisch, Kurt Hornik, Christian Kleiber (2002), strucchange: An R Package for Testing for Structural Change in Linear Regression Models, Journal of Statistical Software, 7(2), 1-38. http://www.jstatsoft.org/v07/i02/.
plot.bcp
, summary.bcp
, and print.bcp
for summaries of the results.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | ##### univariate sequential data #####
# an easy problem with 2 true change points
set.seed(5)
x <- c(rnorm(50), rnorm(50, 5, 1), rnorm(50))
bcp.1a <- bcp(x)
plot(bcp.1a, main="Univariate Change Point Example")
legacyplot(bcp.1a)
# a hard problem with 1 true change point
set.seed(5)
x <- rep(c(0,1), each=50)
y <- x + rnorm(50, sd=1)
bcp.1b <- bcp(y)
plot(bcp.1b, main="Univariate Change Point Example")
##### multivariate sequential data #####
# an easy problem in k=3 dimensions
set.seed(5)
x <- rnorm(6, sd=3)
y <- rbind(cbind(rnorm(50, x[1]), rnorm(50, x[2]), rnorm(50, x[3])),
cbind(rnorm(50, x[4]), rnorm(50, x[5]), rnorm(50, x[6])))
bcp.2a <- bcp(y)
plot(bcp.2a, main="Multivariate (k=3) Change Point Example")
plot(bcp.2a, separated=TRUE, main="Multivariate (k=3) Change Point Example")
# a harder problem in k=5 dimensions
set.seed(5)
means1 <- rep(0, 5)
means2 <- rep(1, 5)
x <- rbind(matrix(rep(means1, each=50), nrow=50),
matrix(rep(means2, each=50), nrow=50))
y <- x + rnorm(length(x), sd=1)
bcp.2b <- bcp(cbind(y))
plot(bcp.2b, main="Multivariate (k=5) Change Point Example")
##### linear models with sequential data #####
# 1 true change point at location 50; the predicting variable x is not related to location
x <- rnorm(100)
b <- rep(c(3,-3), each=50)
y <- b*x + rnorm(100)
bcp.3a <- bcp(y, x)
# in the two plots that follow, the location IDs are used as the plot characters
par(mfrow=c(1,2))
plot(y ~ x, type="n", main="Linear Regression: Raw Data")
text(x, y, as.character(1:100), col=(b/3)+2)
plot(y ~ x, type="n", main="Linear Regression: Posterior Means")
text(x, bcp.3a$posterior.mean[,1], as.character(1:100), col=(b/3)+2)
plot(bcp.3a, main="Linear Regression Change Point Example")
# 1 true change point at location 50; the predicting variable x is equal to location
x <- 1:100
b <- rep(c(3,-3), each=50)
y <- b*x + rnorm(100, sd=50)
bcp.3b <- bcp(y, x)
plot(bcp.3b, main="Linear Regression Change Point Example")
##### univariate data on a grid #####
## Not run:
set.seed(5)
adj <- makeAdjGrid(20)
z <- rep(c(0, 2), each=200)
y <- z + rnorm(400, sd=1)
out <- bcp(y, adj=adj, burnin=500, mcmc=500)
if (require("ggplot2")) {
df <- data.frame(mean=z, data = y, post.means = out$posterior.mean[,1],
post.probs = out$posterior.prob,
i = rep(1:20, each=20), j = rep(1:20, times=20))
# visualize the means
g <- ggplot(df, aes(i,j)) +
geom_tile(aes(fill = mean), color='white') +
scale_fill_gradientn(limits=range(y), colours=c('white', 'steelblue'))+
ggtitle("True Means")
print(g)
# visualize the data
g <- ggplot(df, aes(i,j)) +
geom_tile(aes(fill = data), color='white') +
scale_fill_gradientn(limits=range(y), colours=c('white', 'steelblue'))+
ggtitle("Observed Data")
print(g)
# visualize the posterior means/probs
g <- ggplot(df, aes(i,j)) +
geom_tile(aes(fill = post.means), color='white') +
scale_fill_gradientn(limits=range(y), colours=c('white', 'steelblue'))+
ggtitle("Posterior Means")
print(g)
g <- ggplot(df, aes(i,j)) +
geom_tile(aes(fill = post.probs), color='white') +
scale_fill_gradientn(limits=c(0, 1), colours=c('white', 'steelblue'))+
ggtitle("Posterior Boundary Probabilities")
print(g)
}
## End(Not run)
## Not run:
##### multivariate data on a grid #####
set.seed(5)
x <- rnorm(6, sd=3)
y <- rbind(cbind(rnorm(50, x[1]), rnorm(50, x[2]), rnorm(50, x[3])),
cbind(rnorm(50, x[4]), rnorm(50, x[5]), rnorm(50, x[6])))
adj <- makeAdjGrid(10)
a <- bcp(y, adj=adj, p0=0.4, burnin=500, mcmc=500)
##### linear models on a grid #####
set.seed(5)
x <- rnorm(100)
b <- rep(c(3,-3), each=50)
y <- b*x + rnorm(100)
adj <- makeAdjGrid(10)
a <- bcp(y,x,adj=adj, p0=0.4, burnin=500, mcmc=500)
##### linear models on a grid using pseudo-APPs #####
x <- rnorm(100)
b <- rep(c(3,-3), each=50)
y <- b*x + rnorm(100)
adj <- makeAdjGrid(10)
a <- bcp(y,x,adj=adj, p0=0.4, burnin=500, mcmc=500, p1 = 0)
## End(Not run)
##### univariate data on a graph #####
## Not run:
demo(bcpgraph)
## End(Not run)
###### Real Data Examples ######
## Not run:
# Coriell chromosome 11: univariate sequential data
demo(coriell)
# U.S. ex-post interest rate: univariate sequential data
demo(RealInt)
# Lombard: univariate sequential data (with and without linear models)
demo(Lombard)
# Quebec rivers: multivariate sequential data
demo(QuebecRivers)
# New Haven housing: linear models on a graph
demo(NewHaven)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.