# fit a threshold regression model

### Description

This function can be used to fit a threshold regressio model based on the first-hitting-time of a boundary by the sample path of a Wiener diffusion process. It uses maximum likelihood estimation method for calculating regression coefficient estimates, asymptotic standard errors and p-values.

### Usage

1 |

### Arguments

`formula` |
a formula object, with the response on the left of a ~ operator, and the independent variables on the right. The response must be a survival object as returned by the Surv function. On the right of the ~ operator, a | operator must be used: on the left of the | operator, users specify independent variables that will be used in the linear regression function for |

`data` |
input dataset. Such dataset must be a survival dataset including at least the survival time variable and censoring variable. For the censoring variable, 1 should be used to indicate the subjects with failure observed, and 0 should be used to indicate the subjects that are right censored. The dataset can also include other independent variables that will be used in the threshold regression model. |

### Details

Threshold regression is a recently developed regression methodology to analyze time to event data. For a review of this regression model, see Lee and Whitmore (2006, 2010). A unique feature of threshold regression is that the event time is considered as the time when an underlying stochastic process first hits a boundary threshold. In the context of survival data, for example, the event can be death. The death time of an individual is considered as the time when his/her latent health status decreases to the zero boundary.

In the `threg`

package, a Wiener process *Y(t)* is used to model the latent health status process. An event is observed when *Y(t)* reaches *0* for the first time. Three parameters of the Wiener process are involved: *μ*, *y_0* and *σ*. Parameter *μ*, called the drift of the Wiener process, is the rate per unit time at which the level of the sample path is changing. The sample path approaches the threshold if *μ <0*. Parameter *y_0* is the initial value of the process and is taken as positive. Parameter *σ* represents the variability per unit time of the process (Lee and Whitmore 2006). The first hitting time (FHT) of a Wiener process with *μ*, *y_0* and *σ* is an inverse Gaussian distribution with probability density function (p.d.f):

*f(t|μ,{σ}^2,y_0)=\frac{y_0}{√{2π{σ^2}t^3}}\exp≤ft[-\frac{(y_0+μ t)^2}{2σ^2 t}\right],*

where *-∞ <μ <∞ , σ^2 >0, \mbox{ and } y_0>0*. The p.d.f. is proper if *μ ≤q 0*. The cumulative distribution function of the FHT is:

*F(t|μ,{σ}^2,y_0)=Φ≤ft[-\frac{(y_0+μ t)^2}{√{σ^2 t}}\right]+\exp ≤ft(-\frac{2y_0 μ}{σ^2}\right)Φ≤ft[\frac{μ t - y_0}{√{σ^2 t}}\right],*

where *Φ(\cdot)* is the cumulative distribution function of the standard normal distribution. Note that if *μ>0*, the Wiener process may never hit the boundary at zero and hence there is a probability that the FHT is
*∞*, that is, *P(FHT=∞)=1-\exp(-2y_0μ/σ^2)*.

Since the health status process is usually latent (i.e., unobserved), an arbitrary unit can be used to measure such a process. Hence the variance parameter *σ^2* of the process is set to 1 in the `threg`

package to fix the measurement unit of the process. Then we can regress the other two process parameters, *y_0* and *μ* on covariate data. We assume that *μ* and *\ln(y_0)* are linear in regression coefficients.

Suppose that the covariate vector is *\bm{Z'}=(1, Z_1, \cdots, Z_k)*, where *Z_1, \cdots, Z_k* are covariates and the leading 1 in *\bm{Z'}* allows for a constant term in the regression relationship. Then *\ln(y_0)* can be linked to the covariates with the following regression form:

*\ln(y_0)=γ_0+γ_1 Z_1+\cdots + γ_k Z_k=\bm{Z' γ}*

and *μ* can be linked to the covariates with the following regression form:

*μ=β_0+β_1 Z_1+\cdots + β_k Z_k=\bm{Z' β}*

Vectors *γ* and *β* are regression coefficients for *\ln(y_0)* and *μ*, respectively, with *γ'=(γ_0,\cdots,γ_k)* and *β'=(β_0,\cdots,β_k)*. Note that researchers can set some elements in *γ* or *β* to zero if they feel the corresponding covariates are not important in predicting *\ln(y_0)* or *μ*. For example, if covariate *Z_1* in the vector *Z'* is considered not important to predict *\ln(y_0)*, we can remove the *Z_1* term by setting *γ_1* to zero.

### Author(s)

Tao Xiao

Maintainer: Tao Xiao <taoxiao1@gmail.com>

### References

Lee, M-L. T., Whitmore, G. A. (2010)
Proportional hazards and threshold regression: their theoretical and practical connections., *Lifetime Data Analysis
* 16, 2: 196-214.

Lee, M-L. T., Whitmore, G. A. (2006)
Threshold regression for survival analysis: modeling event times by a stochastic process, *Statistical Science
* 21: 501-513.

Klein, J. P., Moeschberger, M. L. (2003)
*Survival Analysis: Techniques for Censored and Truncated Data*. 2 edition. Springer-Verlag New York, Inc.

Garrett, J. M. (1997)
Odds ratios and confidence intervals for logistic regression
models with effect modification. *Stata Technical Bulletin*, 36, 1522. Reprinted in *Stata
Technical Bulletin Reprints*, vol. 6, pp. 104-114. College Station, TX: Stata Press.

Xiao, T., Whitmore, G. A., He, Xin, Lee, M-L. T. (2015)
The R Package threg to Implement Threshold Regression Models. *Journal of Statistical Software*, 66(8), 1-16. URL http://www.jstatsoft.org/v66/i08/.

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | ```
#load the data "lkr"
data("lkr")
#Transform the "treatment2" variable into factor variable "f.treatment2" .
lkr$f.treatment2=factor(lkr$treatment2)
#fit the threshold regression model on the factor variable "f.treatment2",
fit<-threg(Surv(weeks, relapse)~ f.treatment2|f.treatment2,data = lkr)
fit
#load the data "bmt"
data("bmt")
#Transform the "group" and "fab" variables into factor variables
#"f.group" and "f.fab".
bmt$f.group=factor(bmt$group)
bmt$f.fab=factor(bmt$fab)
#fit a threshold regression model on the "bmt" dataset, by using "recipient_age" and
#"f.fab" as the predictors for ln(y0), and "f.group" and "f.fab" as predictors for mu.
fit<-threg(Surv(time, indicator)~ recipient_age+f.fab|f.group+f.fab, data = bmt)
fit
``` |