segmented | R Documentation |
Fits regression models with segmented relationships between the response and one or more explanatory variables. Break-point estimates are provided.
segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
model = TRUE, ...)
## Default S3 method:
segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
model = TRUE, keep.class=FALSE, ...)
## S3 method for class 'lm'
segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
model = TRUE, keep.class=FALSE, ...)
## S3 method for class 'glm'
segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
model = TRUE, keep.class=FALSE, ...)
## S3 method for class 'Arima'
segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
model = TRUE, keep.class=FALSE, ...)
## S3 method for class 'numeric'
segmented(obj, seg.Z, psi, npsi, fixed.psi=NULL, control = seg.control(),
model = TRUE, keep.class=FALSE, adjX=FALSE, weights=NULL, ...)
obj |
standard ‘linear’ model of class "lm", "glm" or "Arima", or potentially any regression
fit may be supplied since version 0.5-0 (see 'Details'). |
seg.Z |
the segmented variable(s), i.e. the continuous covariate(s) understood to have a piecewise-linear relationship with response. It is a formula with no response variable, such as |
psi |
starting values for the breakpoints to be estimated. If there is a single segmented variable specified in |
npsi |
A named vector or list meaning the number (and not locations) of breakpoints to be estimated. The starting values will be internally computed via the quantiles or equally spaced values, as specified in argument |
fixed.psi |
An optional named list meaning the breakpoints to be kept fixed during the estimation procedure. The names should be a subset of (or even the same) variables specified in |
control |
a list of parameters for controlling the fitting process.
See the documentation for |
model |
logical value indicating if the model.frame should be returned. |
keep.class |
logical value indicating if the final fit returned by |
... |
optional arguments (to be ignored safely). Notice specific arguments relevant to the original call (via |
adjX |
if |
weights |
the weights if |
Given a linear regression model usually of class "lm" or "glm" (or even a simple numeric/ts vector), segmented tries to estimate
a new regression model having broken-line relationships with the variables specified in seg.Z
.
A segmented (or broken-line) relationship is defined by the slope
parameters and the break-points where the linear relation changes. The number of breakpoints
of each segmented relationship is fixed via the psi
(or npsi
) argument, where initial
values for the break-points (or simply their number via npsi
) must be specified. The model
is estimated simultaneously yielding point estimates and relevant approximate
standard errors of all the model parameters, including the break-points.
Since version 0.2-9.0 segmented
implements the bootstrap restarting algorithm described in Wood (2001).
The bootstrap restarting is expected to escape the local optima of the objective function when the
segmented relationship is flat and the log likelihood can have multiple local optima.
Since version 0.5-0.0 the default method segmented.default
has been added to estimate segmented relationships in
general (besides "lm" and "glm" fits) regression models, such as Cox regression or quantile regression (for a single percentile).
The objective function to be minimized is the (minus) value extracted by the logLik
function or it may be passed on via
the fn.obj
argument in seg.control
. See example below. While the default method is expected to work with any regression
fit (where the usual coef()
, update()
, and logLik()
returns appropriate results), it is not recommended for
"lm" or "glm" fits (as segmented.default
is slower than the specific methods segmented.lm
and segmented.glm
), although
final results are the same. However the object returned by segmented.default
is not of class "segmented", as currently
the segmented methods are not guaranteed to work for ‘generic’ (i.e., besides "lm" and "glm") regression fits. The user
could try each "segmented" method on the returned object by calling it explicitly (e.g. via plot.segmented()
or confint.segmented()
wherein the regression coefficients and relevant covariance matrix have to be specified, see .coef
and .vcov
in plot.segmented()
, confint.segmented()
, slope()
).
segmented returns an object of class "segmented" which inherits
from the class of obj
, for instance "lm" or "glm".
An object of class "segmented" is a list containing the components of the
original object obj
with additionally the followings:
psi |
estimated break-points (sorted) and relevant (approximate) standard errors |
it |
number of iterations employed |
epsilon |
difference in the objective function when the algorithm stops |
model |
the model frame |
psi.history |
a list or a vector including the breakpoint estimates at each step |
seed |
the integer vector containing the seed just before the bootstrap resampling. Returned only if bootstrap restart is employed |
.. |
Other components are not of direct interest of the user |
At convergence, if the estimated breakpoints are too close each other or at the boundaries, the parameter point estimate could be returned, but without finite standard errors. To avoid that, segmented
revises the final breakpoint estimates to allow that at least min.nj
are within each interval of the segmented covariate. A warning message is printed if such adjustment is made. See min.nj
in seg.control
.
The algorithm will start if the it.max
argument returned by seg.control
is greater than zero. If it.max=0
segmented
will estimate a new linear model with
break-point(s) fixed at the values reported in psi
.Alternatively, it is also possible to set h=0
in seg.control()
. In this case, bootstrap restarting is unncessary, then to have breakpoints at mypsi
type
segmented(.., psi=mypsi, control=seg.control(h=0, n.boot=0, it.max=1))
In the returned fit object, ‘U.’ is put before the name of the segmented variable to mean the difference-in-slopes coefficient.
Methods specific to the class "segmented"
are
print.segmented
summary.segmented
print.summary.segmented
plot.segmented
lines.segmented
confint.segmented
vcov.segmented
predict.segmented
points.segmented
coef.segmented
Others are inherited from the class "lm"
or "glm"
depending on the
class of obj
.
Vito M. R. Muggeo, vito.muggeo@unipa.it
Muggeo, V.M.R. (2003) Estimating regression models with unknown break-points. Statistics in Medicine 22, 3055–3071.
Muggeo, V.M.R. (2008) Segmented: an R package to fit regression models with broken-line relationships. R News 8/1, 20–25.
segmented.glm
for segmented GLM and segreg
to fit the models via a formula interface. segmented.lme
fits random changepoints (segmented mixed) models.
set.seed(12)
xx<-1:100
zz<-runif(100)
yy<-2+1.5*pmax(xx-35,0)-1.5*pmax(xx-70,0)+15*pmax(zz-.5,0)+rnorm(100,0,2)
dati<-data.frame(x=xx,y=yy,z=zz)
out.lm<-lm(y~x,data=dati)
#the simplest example: the starting model includes just 1 covariate
#.. and 1 breakpoint has to be estimated for that
o<-segmented(out.lm) #1 breakpoint for x
#the single segmented variable is not in the starting model, and thus..
#... you need to specify it via seg.Z, but no starting value for psi
o<-segmented(out.lm, seg.Z=~z)
#note the leftmost slope is constrained to be zero (since out.lm does not include z)
#2 segmented variables, 1 breakpoint each (again no need to specify npsi or psi)
o<-segmented(out.lm,seg.Z=~z+x)
#1 segmented variable, but 2 breakpoints: you have to specify starting values (vector) for psi:
o<-segmented(out.lm,seg.Z=~x,psi=c(30,60), control=seg.control(display=FALSE))
#.. or you can specify just the *number* of breakpoints
#o<-segmented(out.lm,seg.Z=~x, npsi=2, control=seg.control(display=FALSE))
slope(o) #the slopes of the segmented relationship
#2 segmented variables: starting values requested via a named list
out.lm<-lm(y~z,data=dati)
o1<-update(o,seg.Z=~x+z,psi=list(x=c(30,60),z=.3))
#..or by specifying just the *number* of breakpoints
#o1<-update(o,seg.Z=~x+z, npsi=c(x=2,z=1))
#the default method leads to the same results (but it is slower)
#o1<-segmented.default(out.lm,seg.Z=~x+z,psi=list(x=c(30,60),z=.3))
#o1<-segmented.default(out.lm,seg.Z=~x+z,psi=list(x=c(30,60),z=.3),
# control=seg.control(fn.obj="sum(x$residuals^2)"))
#automatic procedure to estimate breakpoints in the covariate x (starting from K quantiles)
# Hint: increases number of iterations. Notice: bootstrap restart is not allowed!
# However see ?selgmented for a better approach
#o<-segmented.lm(out.lm,seg.Z=~x+z,psi=list(x=NA,z=.3),
# control=seg.control(fix.npsi=FALSE, n.boot=0, tol=1e-7, it.max = 50, K=5, display=TRUE))
#assess the progress of the breakpoint estimates throughout the iterations
## Not run:
par(mfrow=c(1,2))
draw.history(o, "x")
draw.history(o, "z")
## End(Not run)
#try to increase the number of iterations and re-assess the
#convergence diagnostics
# A simple segmented model with continuous responses and no linear covariates
# No need to fit the starting lm model:
segmented(yy, npsi=2) #NOTE: subsetting the vector works ( segmented(yy[-1],..) )
#only a single segmented covariate is allowed in seg.Z, and if seg.Z is unspecified,
# the segmented variable is taken as 1:n/n
# An example using the Arima method:
## Not run:
n<-50
idt <-1:n #the time index
mu<-50-idt +1.5*pmax(idt-30,0)
set.seed(6969)
y<-mu+arima.sim(list(ar=.5),n)*3.5
o<-arima(y, c(1,0,0), xreg=idt)
os1<-segmented(o, ~idt, control=seg.control(display=TRUE))
#note using the .coef argument is mandatory!
slope(os1, .coef=os1$coef)
plot(y)
plot(os1, add=TRUE, .coef=os1$coef, col=2)
## End(Not run)
################################################################
################################################################
######Four examples using the default method:
################################################################
################################################################
################################################################
#==> 1. Cox regression with a segmented relationship
################################################################
## Not run:
library(survival)
data(stanford2)
o<-coxph(Surv(time, status)~age, data=stanford2)
os<-segmented(o, ~age, psi=40) #estimate the breakpoint in the age effect
summary(os) #actually it means summary.coxph(os)
plot(os) #it does not work
plot.segmented(os) #call explicitly plot.segmented() to plot the fitted piecewise lines
################################################################
# ==> 2. Linear mixed model via the nlme package
################################################################
dati$g<-gl(10,10) #the cluster 'id' variable
library(nlme)
o<-lme(y~x+z, random=~1|g, data=dati)
os<-segmented.default(o, ~x+z, npsi=list(x=2, z=1))
#summarizing results (note the '.coef' argument)
slope(os, .coef=fixef(os))
plot.segmented(os, "x", .coef=fixef(os), conf.level=.95)
confint.segmented(os, "x", .coef=fixef(os))
dd<-data.frame(x=c(20,50),z=c(.2,.6), g=1:2)
predict.segmented(os, newdata=dd, .coef=fixef(os))
################################################################
# ==> 3. segmented quantile regression via the quantreg package
################################################################
library(quantreg)
data(Mammals)
y<-with(Mammals, log(speed))
x<-with(Mammals, log(weight))
o<-rq(y~x, tau=.9)
os<-segmented.default(o, ~x) #it does NOT work. It cannot compute the vcov matrix..
#Let's define the vcov.rq function.. (I don't know if it is the best option..)
vcov.rq<-function(x,...) {
V<-summary(x,cov=TRUE,se="nid",...)$cov
rownames(V)<-colnames(V)<-names(x$coef)
V}
os<-segmented.default(o, ~x) #now it does work
plot.segmented(os, res=TRUE, col=2, conf.level=.95)
################################################################
# ==> 4. segmented regression with the svyglm() (survey package)
################################################################
library(survey)
data(api)
dstrat<-svydesign(id=~1,strata=~stype, weights=~pw, data=apistrat, fpc=~fpc)
o<-svyglm(api00~ell, design=dstrat)
#specify as a string the objective function to be minimized. It can be obtained via svyvar()
fn.x<- 'as.numeric(svyvar(resid(x, "pearson"), x$survey.design, na.rm = TRUE))'
os<-segmented.default(o, ~ell, control=seg.control(fn.obj=fn.x, display=TRUE))
slope(os)
plot.segmented(os, res=TRUE, conf.level=.9, shade=TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.