selgmented | R Documentation |
This function selects (and estimates) the number of breakpoints of the segmented relationship according to the BIC/AIC criterion or sequential hypothesis testing.
selgmented(olm, seg.Z, Kmax=2, type = c("score", "bic", "davies", "aic"),
alpha = 0.05, control = seg.control(), refit = FALSE, stop.if = 5,
return.fit = TRUE, bonferroni = FALSE, msg = TRUE, plot.ic = FALSE, th = NULL,
G = 1, check.dslope = TRUE)
olm |
A starting |
seg.Z |
A one-side formula for the segmented variable. Only one term can be included, and it can be omitted if |
Kmax |
The maximum number of breakpoints being tested. If |
type |
Which criterion should be used? Options |
alpha |
The fixed type I error probability when sequential hypothesis testing is carried out (i.e. |
control |
See |
refit |
If |
stop.if |
An integer. If, when trying models with an increasing (when |
return.fit |
If |
bonferroni |
If |
msg |
If |
plot.ic |
If |
th |
When a large number of breakpoints is being tested, it could happen that 2 estimated breakpoints are too close each other, and only one can be retained. Thus if the difference between two breakpoints is less or equal to |
G |
Number of sub-intervals to consider to search for the breakpoints when |
check.dslope |
Logical. Effective only if |
The function uses properly the functions segmented
, pscore.test
or davies.test
to select the 'optimal' number of breakpoints 0,1,...,Kmax
. If type='bic'
or 'aic'
, the procedure stops if the last stop.if
fits have increasing values of the information criterion. Moreover, a breakpoint is removed if too close to other, actually if the difference between two consecutive estimates is less then th
. Finally, if check.dslope=TRUE
, breakpoints whose corresponding slope difference estimate is ‘small’ (i.e. p
-value larger then alpha
or alpha/Kmax
) are also removed.
When G>1
the dataset is split into G
groups, and the search is carried out separately within each group. This approach is fruitful when there are many breakpoints not evenly spaced in the covariate range and/or concentrated in some sub-intervals. G=3
or 4
is recommended based on simulation evidence.
Note Kmax
is always tacitely reduced in order to have at least 1 residual df in the model with Kmax
changepoints. Namely, if n=20
, the maximal segmented model has 2*(Kmax + 1)
parameters, and therefore the largest Kmax
allowed is 8.
When type='score'
or 'davies'
, the function also returns the 'overall p-value' coming from combing the single p-values using the Fisher method. The pooled p-value, however, does not affect the final result which depends on the single p-values only.
The returned object depends on argument return.fit
. If FALSE
, the returned object is a list with some information on the compared models (i.e. the BIC values), otherwise a classical 'segmented'
object (see segmented
for details) with the component selection.psi
including the A/BIC values and
- if refit=TRUE
, psi.no.refit
that represents the breakpoint values before the last fit (with boot restarting)
- if G>1
, cutvalues
including the cutoffs values used to split the data.
If check.dslope=TRUE
, there is no guarantee that the final model has the lowest AIC/BIC. Namely the model with the best A/BIC could have ‘non-significant’ slope differences which will be removed (with the corresponding breakpoints) by the final model. Hence the possible plot (obtained via plot.ic=TRUE
) could be misleading. See Example 1 below.
Vito M. R. Muggeo
Muggeo V (2020) Selecting number of breakpoints in segmented regression: implementation in the R package segmented https://www.researchgate.net/publication/343737604
segmented
, pscore.test
, davies.test
set.seed(12)
xx<-1:100
zz<-runif(100)
yy<-2+1.5*pmax(xx-35,0)-1.5*pmax(xx-70,0)+15*pmax(zz-.5,0)+rnorm(100,0,2)
dati<-data.frame(x=xx,y=yy,z=zz)
out.lm<-lm(y~x,data=dati)
os <-selgmented(out.lm) #selection (Kmax=2) via the Score test (default)
os <-selgmented(out.lm, type="bic", Kmax=3) #BIC-based selection
## Not run:
########################################
#Example 1: selecting a large number of breakpoints
b <- c(-1,rep(c(1.5,-1.5),l=15))
psi <- seq(.1,.9,l=15)
n <- 2000
x <- 1:n/n
X <- cbind(x, outer(x,psi,function(x,y)pmax(x-y,0)))
mu <- drop(tcrossprod(X,t(b)))
set.seed(113)
y<- mu + rnorm(n)*.022
par(mfrow=c(1,2))
#select number of breakpoints via the BIC (and plot it)
o<-selgmented(y, Kmax=20, type='bic', plot.ic=TRUE, check.dslope = FALSE)
plot(o, res=TRUE, col=2, lwd=3)
points(o)
# select via the BIC + check on the slope differences (default)
o1 <-selgmented(y, Kmax=20, type='bic', plot.ic=TRUE) #check.dslope = TRUE by default
#note the plot of BIC is misleading.. But the number of psi is correct
plot(o1, add=TRUE, col=3)
points(o1, col=3, pch=3)
##################################################
#Example 2: a large number of breakpoints not evenly spaced.
b <- c(-1,rep(c(2,-2),l=10))
psi <- seq(.5,.9,l=10)
n <- 2000
x <- 1:n/n
X <- cbind(x, outer(x,psi,function(x,y)pmax(x-y,0)))
mu <- drop(tcrossprod(X,t(b)))
y<- mu + rnorm(n)*.02
#run selgmented with G>1. G=3 or 4 recommended.
#note G=1 does not return the right number of breaks
o1 <-selgmented(y, type="bic", Kmax=20, G=4)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.