fit a threshold regression model


This function can be used to fit a threshold regressio model based on the first-hitting-time of a boundary by the sample path of a Wiener diffusion process. It uses maximum likelihood estimation method for calculating regression coefficient estimates, asymptotic standard errors and p-values.





a formula object, with the response on the left of a ~ operator, and the independent variables on the right. The response must be a survival object as returned by the Surv function. On the right of the ~ operator, a | operator must be used: on the left of the | operator, users specify independent variables that will be used in the linear regression function for \ln{y_0} in the threshold regression model; on the right of the | operator, users specify independent variables that will be used in the linear regression function for μ in the threshold regression model. If users just want to use a constant \ln{y_0} or μ, he or she can put 0 or 1 as a placeholder on the left or right of the | operator, instead of listing the independent variables for \ln{y_0} or μ.


input dataset. Such dataset must be a survival dataset including at least the survival time variable and censoring variable. For the censoring variable, 1 should be used to indicate the subjects with failure observed, and 0 should be used to indicate the subjects that are right censored. The dataset can also include other independent variables that will be used in the threshold regression model.


Threshold regression is a recently developed regression methodology to analyze time to event data. For a review of this regression model, see Lee and Whitmore (2006, 2010). A unique feature of threshold regression is that the event time is considered as the time when an underlying stochastic process first hits a boundary threshold. In the context of survival data, for example, the event can be death. The death time of an individual is considered as the time when his/her latent health status decreases to the zero boundary.

In the threg package, a Wiener process Y(t) is used to model the latent health status process. An event is observed when Y(t) reaches 0 for the first time. Three parameters of the Wiener process are involved: μ, y_0 and σ. Parameter μ, called the drift of the Wiener process, is the rate per unit time at which the level of the sample path is changing. The sample path approaches the threshold if μ <0. Parameter y_0 is the initial value of the process and is taken as positive. Parameter σ represents the variability per unit time of the process (Lee and Whitmore 2006). The first hitting time (FHT) of a Wiener process with μ, y_0 and σ is an inverse Gaussian distribution with probability density function (p.d.f):

f(t|μ,{σ}^2,y_0)=\frac{y_0}{√{2π{σ^2}t^3}}\exp≤ft[-\frac{(y_0+μ t)^2}{2σ^2 t}\right],

where -∞ <μ <∞ , σ^2 >0, \mbox{ and } y_0>0. The p.d.f. is proper if μ ≤q 0. The cumulative distribution function of the FHT is:

F(t|μ,{σ}^2,y_0)=Φ≤ft[-\frac{(y_0+μ t)^2}{√{σ^2 t}}\right]+\exp ≤ft(-\frac{2y_0 μ}{σ^2}\right)Φ≤ft[\frac{μ t - y_0}{√{σ^2 t}}\right],

where Φ(\cdot) is the cumulative distribution function of the standard normal distribution. Note that if μ>0, the Wiener process may never hit the boundary at zero and hence there is a probability that the FHT is , that is, P(FHT=∞)=1-\exp(-2y_0μ/σ^2).

Since the health status process is usually latent (i.e., unobserved), an arbitrary unit can be used to measure such a process. Hence the variance parameter σ^2 of the process is set to 1 in the threg package to fix the measurement unit of the process. Then we can regress the other two process parameters, y_0 and μ on covariate data. We assume that μ and \ln(y_0) are linear in regression coefficients.

Suppose that the covariate vector is \bm{Z'}=(1, Z_1, \cdots, Z_k), where Z_1, \cdots, Z_k are covariates and the leading 1 in \bm{Z'} allows for a constant term in the regression relationship. Then \ln(y_0) can be linked to the covariates with the following regression form:

\ln(y_0)=γ_0+γ_1 Z_1+\cdots + γ_k Z_k=\bm{Z' γ}

and μ can be linked to the covariates with the following regression form:

μ=β_0+β_1 Z_1+\cdots + β_k Z_k=\bm{Z' β}

Vectors γ and β are regression coefficients for \ln(y_0) and μ, respectively, with γ'=(γ_0,\cdots,γ_k) and β'=(β_0,\cdots,β_k). Note that researchers can set some elements in γ or β to zero if they feel the corresponding covariates are not important in predicting \ln(y_0) or μ. For example, if covariate Z_1 in the vector Z' is considered not important to predict \ln(y_0), we can remove the Z_1 term by setting γ_1 to zero.


Tao Xiao

Maintainer: Tao Xiao <>


Lee, M-L. T., Whitmore, G. A. (2010) Proportional hazards and threshold regression: their theoretical and practical connections., Lifetime Data Analysis 16, 2: 196-214.

Lee, M-L. T., Whitmore, G. A. (2006) Threshold regression for survival analysis: modeling event times by a stochastic process, Statistical Science 21: 501-513.

Klein, J. P., Moeschberger, M. L. (2003) Survival Analysis: Techniques for Censored and Truncated Data. 2 edition. Springer-Verlag New York, Inc.

Garrett, J. M. (1997) Odds ratios and confidence intervals for logistic regression models with effect modification. Stata Technical Bulletin, 36, 1522. Reprinted in Stata Technical Bulletin Reprints, vol. 6, pp. 104-114. College Station, TX: Stata Press.

Xiao, T., Whitmore, G. A., He, Xin, Lee, M-L. T. (2015) The R Package threg to Implement Threshold Regression Models. Journal of Statistical Software, 66(8), 1-16. URL


#load the data "lkr"

#Transform the "treatment2" variable into factor variable "f.treatment2" .

#fit the threshold regression model on the factor variable "f.treatment2", 
fit<-threg(Surv(weeks, relapse)~ f.treatment2|f.treatment2,data = lkr)

#load the data "bmt"

#Transform the "group" and "fab" variables into factor variables 
#"" and "f.fab".

#fit a threshold regression model on the "bmt" dataset, by using "recipient_age" and 
#"f.fab" as the predictors for ln(y0), and "" and "f.fab" as predictors for mu.
fit<-threg(Surv(time, indicator)~ recipient_age+f.fab|, data = bmt)
comments powered by Disqus