Buckley-James regression for right-censoring survival data with high-dimensional covariates. Including L_2 boosting with componentwise linear least squares, componentwise P-splines, regression trees. Other Buckley-James methods including elastic net, MCP, SCAD, MARS and ACOSSO (ACOSSO not supported for the current version).

1 2 3 4 5 6 | ```
bujar(y, cens, x, valdata = NULL, degree = 1, learner = "linear.regression",
center=TRUE, mimpu = NULL, iter.bj = 20, max.cycle = 5, nu = 0.1, mstop = 50,
twin = FALSE, mstop2= 100, tuning = TRUE, cv = FALSE, nfold = 5, method = "corrected",
vimpint = TRUE,gamma = 3, lambda=NULL, whichlambda=NULL, lamb = 0, s = 0.5, nk = 4,
wt.pow = 1, theta = NULL, rel.inf = FALSE, tol = .Machine$double.eps, n.cores= 2,
rng=123, trace = FALSE)
``` |

`y` |
survival time |

`cens` |
censoring indicator, must be 0 or 1 with 0=alive, 1=dead |

`x` |
covariate matrix |

`valdata` |
test data, which must have the first column as survival time, second column as censoring indicator, and the remaining columns similar to same x. |

`degree` |
mars/tree/linear regression degree of interaction; if 2, second-order interaction, if degree=1, additive model; |

`learner` |
methods used for BJ regression. |

`center` |
center covariates |

`mimpu` |
initial estimate. If TRUE, mean-imputation; FALSE, imputed with the marginal best variable linear regression; if NULL, 0. |

`iter.bj` |
number of B-J iteration |

`max.cycle` |
max cycle allowed |

`nu` |
step-size boosting parameter |

`mstop` |
boosting tuning parameters. It can be one number or have the length |

`twin` |
logical, if TRUE, twin boosting |

`mstop2` |
twin boosting tuning parameter |

`tuning` |
logical value. if TRUE, the tuning parameter will be selected by cv or AIC/BIC methods. Ignored if |

`cv` |
logical value. if TRUE, cross-validation for tuning parameter, only used if |

`nfold` |
number of fold of cv |

`method` |
boosting tuning parameter selection method in AIC |

`vimpint` |
logical value. If TRUE, compute variable importance and interaction measures for MARS if |

`gamma` |
MCP, or SCAD gamma tuning parameter |

`lambda` |
MCP, or SCAD lambda tuning parameter |

`whichlambda` |
which lambda used for MCP or SCAD lambda tuning parameter |

`lamb` |
elastic net lambda tuning parameter, only used if |

`s` |
the second enet tuning parameter, which is a fraction between (0, 1), only used if |

`nk` |
number of basis function for |

`wt.pow` |
not used but kept for historical reasons, only for |

`theta` |
For |

`rel.inf` |
logical value. if TRUE, variable importance measure and interaction importance measure computed |

`tol` |
convergency criteria |

`n.cores` |
The number of CPU cores to use. The cross-validation loop
will attempt to send different CV folds off to different cores. Used for |

`rng` |
a number to be used for random number generation in boosting trees |

`trace` |
logical value. If TRUE, print out interim computing results |

Buckley-James regression for right-censoring survival data with high-dimensional covariates. Including L_2 boosting with componentwise linear least squares, componentwise P-splines, regression trees. Other Buckley-James methods including elastic net, SCAD and MCP. `learner="enet"`

and `learner="enet2"`

use two different implementations of LASSO. Some of these methods are discussed in Wang and Wang (2010) and the references therein. Also see the references below.

`x` |
original covariates |

`y` |
survival time |

`cens` |
censoring indicator |

`ynew` |
imputed y |

`yhat` |
estimated y from ynew |

`pred.bj` |
estimated y from the testing sample |

`res.fit` |
model fitted with the learner |

`learner` |
original learner used |

`degree` |
=1, additive model, degree=2, second-order interaction |

`mse` |
MSE at each BJ iteration, only available in simulations, or when valdata provided |

`mse.bj` |
MSE from training data at the BJ termination |

`mse.bj.val` |
MSE with valdata |

`mse.all` |
a vector of MSE for uncensoring data at BJ iteration |

`nz.bj.iter` |
number of selected covariates at each BJ iteration |

`nz.bj` |
number of selected covariates at the claimed BJ termination |

`xselect` |
a vector of dimension of covariates, either 1 (covariate selected) or 0 (not selected) |

`coef.bj` |
estimated coefficients with linear model |

`vim` |
a vector of length of number of column of x, variable importance, between 0 to 100 |

`interactions` |
measure of strength of interactions |

`ybstdiff` |
largest absolute difference of estimated y. Useful to monitor convergency |

`ybstcon` |
a vector with length of BJ iteration each is a convergency measure |

`cycleperiod` |
number of cycle of BJ iteration |

`cycle.coef.diff` |
within cycle of BJ, the maximum difference of coefficients for BJ boosting |

`nonconv` |
logical value. if TRUE, non-convergency |

`fnorm2` |
value of L_2 norm, can be useful to access convergencey |

`mselect` |
a vector of length of BJ iteration, each element is the tuning parameter mstop |

`contype` |
0 (converged), 1, not converged but cycle found, 2, not converged and max iteration reached. |

Zhu Wang

Zhu Wang and C.Y. Wang (2010),
Buckley-James Boosting for Survival Analysis with High-Dimensional
Biomarker Data.
*Statistical Applications in Genetics and Molecular Biology*,
Vol. 9 : Iss. 1, Article 24.

Peter Buhlmann and Bin Yu (2003),
Boosting with the L2 loss: regression and classification.
*Journal of the American Statistical Association*, **98**,
324–339.

Peter Buhlmann (2006), Boosting for high-dimensional linear models.
*The Annals of Statistics*, **34**(2), 559–583.

Peter Buhlmann and Torsten Hothorn (2007),
Boosting algorithms: regularization, prediction and model fitting.
*Statistical Science*, **22**(4), 477–505.

J. Friedman (1991), Multivariate Adaptive Regression Splines (with
discussion) .
*Annals of Statistics*, **19**/1, 1–141.

J.H. Friedman, T. Hastie and R. Tibshirani (2000), Additive Logistic Regression:
a Statistical View of Boosting. *Annals of Statistics* **28**(2):337-374.

C. Storlie, H. Bondell, B. Reich and H. H. Zhang (2009),
Surface Estimation, Variable Selection, and the Nonparametric Oracle
Property.
*Statistica Sinica*, to appear.

Sijian Wang, Bin Nan, Ji Zhu, and David G. Beer (2008),
Doubly penalized Buckley-James Method for Survival Data with High-Dimensional
Covariates.
*Biometrics*,
**64**:132-140.

H. Zou and T. Hastie (2005), Regularization and variable selection via the elastic net.
*Journal of the Royal Statistical Society*, Series B, **67**, 301-320.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | ```
data("wpbc", package = "TH.data")
wpbc2 <- wpbc[, 1:12]
wpbc2$status <- as.numeric(wpbc2$status) - 1
fit <- bujar(y=log(wpbc2$time),cens=wpbc2$status, x= wpbc2[, -(1:2)])
print(fit)
coef(fit)
pr <- predict(fit)
plot(fit)
fit <- bujar(y=log(wpbc2$time),cens=wpbc2$status, x= wpbc2[, -(1:2)], tuning = TRUE)
## Not run:
fit <- bujar(y=log(wpbc2$time),cens=wpbc2$status, x=wpbc2[, -(1:2)], learner="pspline")
fit <- bujar(y=log(wpbc2$time),cens=wpbc2$status, x=wpbc2[, -(1:2)],
learner="tree", degree=2)
### select tuning parameter for "enet"
tmp <- gcv.enet(y=log(wpbc2$time), cens=wpbc2$status, x=wpbc2[, -(1:2)])
fit <- bujar(y=log(wpbc2$time),cens=wpbc2$status, x=wpbc2[, -(1:2)], learner="enet",
lamb = tmp$lambda, s=tmp$s)
fit <- bujar(y=log(wpbc2$time),cens=wpbc2$status, x=wpbc2[, -(1:2)], learner="mars",
degree=2)
summary(fit)
## End(Not run)
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.