Description Usage Format Details Note Source Examples
Data to examine the correlation between the level of prostate-specific antigen and a number of clinical measures in men who were about to receive a radical prostatectomy.
1 |
A data frame with 97 observations on the following 10 variables.
log cancer volume
log prostate weight
in years
log of the amount of benign prostatic hyperplasia
seminal vesicle invasion
log of capsular penetration
a numeric vector
percent of Gleason score 4 or 5
response
a logical vector
The last column indicates which 67 observations were used as the "training set" and which 30 as the test set, as described on page 48 in the book.
There was an error in this dataset in earlier versions of the package, as indicated in a footnote on page 3 of the second edition of the book. As of version 2012.04-0 this was corrected.
Stamey, T., Kabalin, J., McNeal, J., Johnstone, I., Freiha, F., Redwine, E. and Yang, N (1989) Prostate specific antigen in the diagnosis and treatment of adenocarcinoma of the prostate II. Radical prostatectomy treted patients, Journall of Urology 16: 1076–1083.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 | if(interactive())par(ask=TRUE)
str( prostate )
cor( prostate[,1:8] )
pairs( prostate[,1:9], col="violet" )
train <- subset( prostate, train==TRUE )[,1:9]
test <- subset( prostate, train=FALSE )[,1:9]
#
if( require(leaps)) {
# The book (page 56) uses only train subset, so we the same:
prostate.leaps <- regsubsets( lpsa ~ . , data=train, nbest=70, #all!
really.big=TRUE )
prostate.leaps.sum <- summary( prostate.leaps )
prostate.models <- prostate.leaps.sum$which
prostate.models.size <- as.numeric(attr(prostate.models, "dimnames")[[1]])
hist( prostate.models.size )
prostate.models.rss <- prostate.leaps.sum$rss
prostate.models.best.rss <-
tapply( prostate.models.rss, prostate.models.size, min )
prostate.models.best.rss
# Let us add results for the only intercept model
prostate.dummy <- lm( lpsa ~ 1, data=train )
prostate.models.best.rss <- c(
sum(resid(prostate.dummy)^2),
prostate.models.best.rss)
# Making a plot:
plot( 0:8, prostate.models.best.rss, ylim=c(0, 100),
type="b", xlab="subset size", ylab="Residual Sum Square",
col="red2" )
points( prostate.models.size, prostate.models.rss, pch=17, col="brown",cex=0.7 )
}
# For a better plot, should remove the best for each size from last call!
# Now with ridge regression:
# Ridge regression in R is multiply implemented, at least:
# MASS: lm.ridge
# mda : gen.ridge
#( survival: ridge)
# Design: pentrace
# mgcv: pcls (very general)
# simple.ridge (in this package)
#
library(mda)
#
prostate.ridge.list <- lapply(list(lambda=seq(0,8,by=0.4)), function(lambda)
gen.ridge(train[,1:8], y=train[,9,drop=FALSE], lambda=lambda))
# Problems with this usage.
# simpler usage:
#
prostate.ridge <- gen.ridge(train[,1:8], y=train[,9,drop=FALSE], lambda=1)
#
# Since there is some problems with the mda functions, we use our own:
#
prostate.ridge <- simple.ridge( train[,1:8], train[,9], df=1:8 )
#
# coefficient traces:
#
matplot( prostate.ridge$df, t(prostate.ridge$beta), type="b",
col="blue", pch=17, ylab="coefficients" )
# Calculations for the lasso:
#
if(require(lasso2)) {
prostate.lasso <- l1ce( lpsa ~ ., data=train, trace=TRUE, sweep.out=~1,
bound=seq(0,1,by=0.1) )
prostate.lasso.coef <- sapply(prostate.lasso, function(x) x$coef)
colnames(prostate.lasso.coef) <- seq( 0,1,by=0.1 )
matplot( seq(0,1,by=0.1), t(prostate.lasso.coef[-1,]), type="b",
xlab="shrinkage factor", ylab="coefficients",
xlim=c(0, 1.2), col="blue", pch=17 )
}
#
# lasso with lars:
if (require(lars)) {
#
prostate.lasso.lars <- lars( as.matrix(train[,1:8]), train[,9],
type="lasso", trace=TRUE )
cv.lars( as.matrix(train[,1:8]), train[,9],
type="lasso", trace=TRUE, K=10 )
}
#
# CV (cross-validation) using package boot:
#
library(boot)
prostate.glm <- glm( lpsa ~ ., data=train )
# repeat this some times to make clear that cross-validation is
# a random procedure
#
cv.glm( train, prostate.glm, K=10 )$delta
#
# This is a two-component vector, raw cross-validated estimate and
# adjusted cross-validated estimate.
summary( prostate.glm )
#
|
'data.frame': 97 obs. of 10 variables:
$ lcavol : num -0.58 -0.994 -0.511 -1.204 0.751 ...
$ lweight: num 2.77 3.32 2.69 3.28 3.43 ...
$ age : int 50 58 74 58 62 50 64 58 47 63 ...
$ lbph : num -1.39 -1.39 -1.39 -1.39 -1.39 ...
$ svi : int 0 0 0 0 0 0 0 0 0 0 ...
$ lcp : num -1.39 -1.39 -1.39 -1.39 -1.39 ...
$ gleason: int 6 6 7 6 6 6 6 6 6 6 ...
$ pgg45 : int 0 0 20 0 0 0 0 0 0 0 ...
$ lpsa : num -0.431 -0.163 -0.163 -0.163 0.372 ...
$ train : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
lcavol lweight age lbph svi lcp
lcavol 1.0000000 0.2805214 0.2249999 0.027349703 0.53884500 0.675310484
lweight 0.2805214 1.0000000 0.3479691 0.442264395 0.15538491 0.164537146
age 0.2249999 0.3479691 1.0000000 0.350185896 0.11765804 0.127667752
lbph 0.0273497 0.4422644 0.3501859 1.000000000 -0.08584324 -0.006999431
svi 0.5388450 0.1553849 0.1176580 -0.085843238 1.00000000 0.673111185
lcp 0.6753105 0.1645371 0.1276678 -0.006999431 0.67311118 1.000000000
gleason 0.4324171 0.0568821 0.2688916 0.077820447 0.32041222 0.514830063
pgg45 0.4336522 0.1073538 0.2761124 0.078460018 0.45764762 0.631528246
gleason pgg45
lcavol 0.43241706 0.43365225
lweight 0.05688210 0.10735379
age 0.26889160 0.27611245
lbph 0.07782045 0.07846002
svi 0.32041222 0.45764762
lcp 0.51483006 0.63152825
gleason 1.00000000 0.75190451
pgg45 0.75190451 1.00000000
Loading required package: leaps
Loading required package: class
Loaded mda 0.4-9
Loading required package: lasso2
R Package to solve regression problems while imposing
an L1 constraint on the parameters. Based on S-plus Release 2.1
Copyright (C) 1998, 1999
Justin Lokhorst <jlokhors@stats.adelaide.edu.au>
Berwin A. Turlach <bturlach@stats.adelaide.edu.au>
Bill Venables <wvenable@stats.adelaide.edu.au>
Copyright (C) 2002
Martin Maechler <maechler@stat.math.ethz.ch>
++++++++++++++++++++++++++++++
Solving problem number 1 with bound 0.000000
++++++++++++++++++++++++++++++
******************************
--> Adding variable: 1
******************************
Iteration number: 0
Value of primal object function : 48.140723
Value of dual object function : 48.140723
L1 norm of current beta : 0.000000 <= 0.000000
Maximal absolute value in t(X)%*%r : 5.844390e+01 attained 1 time(s)
Number of parameters allowed to vary : 1
******************************
Iteration number: 1
Value of primal object function : 48.140723
Value of dual object function : 48.140723
L1 norm of current beta : 0.000000 <= 0.000000
Maximal absolute value in t(X)%*%r : 5.844390e+01 attained 1 time(s)
Number of parameters allowed to vary : 1
++++++++++++++++++++++++++++++
Solving problem number 2 with bound 0.226049
++++++++++++++++++++++++++++++
******************************
Iteration number: 0
Value of primal object function : 48.140723
Value of dual object function : 34.929531
L1 norm of current beta : 0.000000 <= 0.226049
Maximal absolute value in t(X)%*%r : 5.844390e+01 attained 1 time(s)
Number of parameters allowed to vary : 1
--> Stepping onto the border of the L1 ball.
******************************
Iteration number: 1
Value of primal object function : 36.615772
Value of dual object function : 36.615772
L1 norm of current beta : 0.226049 <= 0.226049
Maximal absolute value in t(X)%*%r : 4.352465e+01 attained 1 time(s)
Number of parameters allowed to vary : 1
++++++++++++++++++++++++++++++
Solving problem number 3 with bound 0.452098
++++++++++++++++++++++++++++++
******************************
Iteration number: 0
Value of primal object function : 36.615772
Value of dual object function : 26.777062
L1 norm of current beta : 0.226049 <= 0.452098
Maximal absolute value in t(X)%*%r : 4.352465e+01 attained 1 time(s)
Number of parameters allowed to vary : 1
--> Stepping onto the border of the L1 ball.
******************************
Iteration number: 1
Value of primal object function : 28.463303
Value of dual object function : 27.959061
L1 norm of current beta : 0.452098 <= 0.452098
Maximal absolute value in t(X)%*%r : 2.972075e+01 attained 1 time(s)
Number of parameters allowed to vary : 1
--> Adding variable: 2
******************************
Iteration number: 2
Value of primal object function : 28.456569
Value of dual object function : 28.456569
L1 norm of current beta : 0.452098 <= 0.452098
Maximal absolute value in t(X)%*%r : 2.916308e+01 attained 2 time(s)
Number of parameters allowed to vary : 2
++++++++++++++++++++++++++++++
Solving problem number 4 with bound 0.678147
++++++++++++++++++++++++++++++
******************************
Iteration number: 0
Value of primal object function : 28.456569
Value of dual object function : 21.864281
L1 norm of current beta : 0.452098 <= 0.678147
Maximal absolute value in t(X)%*%r : 2.916308e+01 attained 2 time(s)
Number of parameters allowed to vary : 2
--> Stepping onto the border of the L1 ball.
******************************
Iteration number: 1
Value of primal object function : 22.960533
Value of dual object function : 21.746334
L1 norm of current beta : 0.678147 <= 0.678147
Maximal absolute value in t(X)%*%r : 2.125431e+01 attained 1 time(s)
Number of parameters allowed to vary : 2
--> Adding variable: 5
******************************
Iteration number: 2
Value of primal object function : 22.928361
Value of dual object function : 22.928361
L1 norm of current beta : 0.678147 <= 0.678147
Maximal absolute value in t(X)%*%r : 2.008790e+01 attained 3 time(s)
Number of parameters allowed to vary : 3
++++++++++++++++++++++++++++++
Solving problem number 5 with bound 0.904197
++++++++++++++++++++++++++++++
******************************
Iteration number: 0
Value of primal object function : 22.928361
Value of dual object function : 18.387508
L1 norm of current beta : 0.678147 <= 0.904197
Maximal absolute value in t(X)%*%r : 2.008790e+01 attained 3 time(s)
Number of parameters allowed to vary : 3
--> Stepping onto the border of the L1 ball.
******************************
Iteration number: 1
Value of primal object function : 19.329119
Value of dual object function : 17.817274
L1 norm of current beta : 0.904197 <= 0.904197
Maximal absolute value in t(X)%*%r : 1.342891e+01 attained 1 time(s)
Number of parameters allowed to vary : 3
--> Adding variable: 4
******************************
Iteration number: 2
Value of primal object function : 19.308942
Value of dual object function : 18.766620
L1 norm of current beta : 0.904197 <= 0.904197
Maximal absolute value in t(X)%*%r : 1.300277e+01 attained 1 time(s)
Number of parameters allowed to vary : 4
--> Adding variable: 8
******************************
Iteration number: 3
Value of primal object function : 19.305422
Value of dual object function : 19.305422
L1 norm of current beta : 0.904197 <= 0.904197
Maximal absolute value in t(X)%*%r : 1.253535e+01 attained 5 time(s)
Number of parameters allowed to vary : 5
++++++++++++++++++++++++++++++
Solving problem number 6 with bound 1.130246
++++++++++++++++++++++++++++++
******************************
Iteration number: 0
Value of primal object function : 19.305422
Value of dual object function : 16.471817
L1 norm of current beta : 0.904197 <= 1.130246
Maximal absolute value in t(X)%*%r : 1.253535e+01 attained 5 time(s)
Number of parameters allowed to vary : 5
--> Stepping onto the border of the L1 ball.
******************************
Iteration number: 1
Value of primal object function : 17.085548
Value of dual object function : 17.085548
L1 norm of current beta : 1.130246 <= 1.130246
Maximal absolute value in t(X)%*%r : 7.105281e+00 attained 5 time(s)
Number of parameters allowed to vary : 5
++++++++++++++++++++++++++++++
Solving problem number 7 with bound 1.356295
++++++++++++++++++++++++++++++
******************************
Iteration number: 0
Value of primal object function : 17.085548
Value of dual object function : 15.479405
L1 norm of current beta : 1.130246 <= 1.356295
Maximal absolute value in t(X)%*%r : 7.105281e+00 attained 5 time(s)
Number of parameters allowed to vary : 5
--> Stepping onto the border of the L1 ball.
******************************
Iteration number: 1
Value of primal object function : 16.093136
Value of dual object function : 10.770054
L1 norm of current beta : 1.356295 <= 1.356295
Maximal absolute value in t(X)%*%r : 5.599937e+00 attained 1 time(s)
Number of parameters allowed to vary : 5
--> Adding variable: 3
******************************
Iteration number: 2
Value of primal object function : 16.028982
Value of dual object function : 15.977489
L1 norm of current beta : 1.356295 <= 1.356295
Maximal absolute value in t(X)%*%r : 3.034591e+00 attained 1 time(s)
Number of parameters allowed to vary : 6
--> Adding variable: 6
******************************
Iteration number: 3
Value of primal object function : 16.028970
Value of dual object function : 16.028970
L1 norm of current beta : 1.356295 <= 1.356295
Maximal absolute value in t(X)%*%r : 3.009143e+00 attained 7 time(s)
Number of parameters allowed to vary : 7
++++++++++++++++++++++++++++++
Solving problem number 8 with bound 1.582344
++++++++++++++++++++++++++++++
******************************
Iteration number: 0
Value of primal object function : 16.028970
Value of dual object function : 15.348756
L1 norm of current beta : 1.356295 <= 1.582344
Maximal absolute value in t(X)%*%r : 3.009143e+00 attained 7 time(s)
Number of parameters allowed to vary : 7
--> Stepping onto the border of the L1 ball.
******************************
Iteration number: 1
Value of primal object function : 15.437035
Value of dual object function : 15.437035
L1 norm of current beta : 1.582344 <= 1.582344
Maximal absolute value in t(X)%*%r : 2.228089e+00 attained 7 time(s)
Number of parameters allowed to vary : 7
++++++++++++++++++++++++++++++
Solving problem number 9 with bound 1.808393
++++++++++++++++++++++++++++++
******************************
Iteration number: 0
Value of primal object function : 15.437035
Value of dual object function : 14.933377
L1 norm of current beta : 1.582344 <= 1.808393
Maximal absolute value in t(X)%*%r : 2.228089e+00 attained 7 time(s)
Number of parameters allowed to vary : 7
--> Stepping onto the border of the L1 ball.
******************************
Iteration number: 1
Value of primal object function : 15.021655
Value of dual object function : 15.021655
L1 norm of current beta : 1.808393 <= 1.808393
Maximal absolute value in t(X)%*%r : 1.447035e+00 attained 7 time(s)
Number of parameters allowed to vary : 7
++++++++++++++++++++++++++++++
Solving problem number 10 with bound 2.034442
++++++++++++++++++++++++++++++
******************************
Iteration number: 0
Value of primal object function : 15.021655
Value of dual object function : 14.694554
L1 norm of current beta : 1.808393 <= 2.034442
Maximal absolute value in t(X)%*%r : 1.447035e+00 attained 7 time(s)
Number of parameters allowed to vary : 7
--> Stepping onto the border of the L1 ball.
******************************
Iteration number: 1
Value of primal object function : 14.782832
Value of dual object function : 14.782832
L1 norm of current beta : 2.034442 <= 2.034442
Maximal absolute value in t(X)%*%r : 6.659810e-01 attained 7 time(s)
Number of parameters allowed to vary : 7
++++++++++++++++++++++++++++++
Solving problem number 11 with bound 2.260491
++++++++++++++++++++++++++++++
******************************
Iteration number: 0
Value of primal object function : 14.782832
Value of dual object function : 14.632288
L1 norm of current beta : 2.034442 <= 2.260491
Maximal absolute value in t(X)%*%r : 6.659810e-01 attained 7 time(s)
Number of parameters allowed to vary : 7
******************************
Iteration number: 1
Value of primal object function : 14.718650
Value of dual object function : 13.538782
L1 norm of current beta : 2.227187 <= 2.260491
Maximal absolute value in t(X)%*%r : 5.219522e-01 attained 1 time(s)
Number of parameters allowed to vary : 7
--> Adding variable: 7
******************************
Iteration number: 2
Value of primal object function : 14.713192
Value of dual object function : 14.713192
L1 norm of current beta : 2.260491 <= 2.260491
Maximal absolute value in t(X)%*%r : 7.771561e-15 attained 8 time(s)
Number of parameters allowed to vary : 8
Loading required package: lars
Loaded lars 1.2
LASSO sequence
Computing X'X .....
LARS Step 1 : Variable 1 added
LARS Step 2 : Variable 2 added
LARS Step 3 : Variable 5 added
LARS Step 4 : Variable 4 added
LARS Step 5 : Variable 8 added
LARS Step 6 : Variable 3 added
LARS Step 7 : Variable 6 added
LARS Step 8 : Variable 7 added
Computing residuals, RSS etc .....
LASSO sequence
Computing X'X .....
LARS Step 1 : Variable 1 added
LARS Step 2 : Variable 2 added
LARS Step 3 : Variable 5 added
LARS Step 4 : Variable 8 added
LARS Step 5 : Variable 4 added
LARS Step 6 : Variable 3 added
LARS Step 7 : Variable 6 added
LARS Step 8 : Variable 7 added
Computing residuals, RSS etc .....
CV Fold 1
LASSO sequence
Computing X'X .....
LARS Step 1 : Variable 1 added
LARS Step 2 : Variable 2 added
LARS Step 3 : Variable 5 added
LARS Step 4 : Variable 8 added
LARS Step 5 : Variable 4 added
LARS Step 6 : Variable 6 added
LARS Step 7 : Variable 3 added
LARS Step 8 : Variable 7 added
Computing residuals, RSS etc .....
CV Fold 2
LASSO sequence
Computing X'X .....
LARS Step 1 : Variable 1 added
LARS Step 2 : Variable 2 added
LARS Step 3 : Variable 5 added
LARS Step 4 : Variable 4 added
LARS Step 5 : Variable 8 added
LARS Step 6 : Variable 3 added
LARS Step 7 : Variable 6 added
LARS Step 8 : Variable 7 added
Computing residuals, RSS etc .....
CV Fold 3
LASSO sequence
Computing X'X .....
LARS Step 1 : Variable 1 added
LARS Step 2 : Variable 2 added
LARS Step 3 : Variable 5 added
LARS Step 4 : Variable 4 added
LARS Step 5 : Variable 8 added
LARS Step 6 : Variable 3 added
LARS Step 7 : Variable 6 added
LARS Step 8 : Variable 7 added
Computing residuals, RSS etc .....
CV Fold 4
LASSO sequence
Computing X'X .....
LARS Step 1 : Variable 1 added
LARS Step 2 : Variable 2 added
LARS Step 3 : Variable 5 added
LARS Step 4 : Variable 4 added
LARS Step 5 : Variable 8 added
LARS Step 6 : Variable 3 added
LARS Step 7 : Variable 6 added
LARS Step 8 : Variable 7 added
Computing residuals, RSS etc .....
CV Fold 5
LASSO sequence
Computing X'X .....
LARS Step 1 : Variable 1 added
LARS Step 2 : Variable 2 added
LARS Step 3 : Variable 5 added
LARS Step 4 : Variable 4 added
LARS Step 5 : Variable 8 added
LARS Step 6 : Variable 7 added
LARS Step 7 : Variable 3 added
LARS Step 8 : Variable 6 added
Computing residuals, RSS etc .....
CV Fold 6
LASSO sequence
Computing X'X .....
LARS Step 1 : Variable 1 added
LARS Step 2 : Variable 2 added
LARS Step 3 : Variable 5 added
LARS Step 4 : Variable 8 added
LARS Step 5 : Variable 4 added
LARS Step 6 : Variable 3 added
LARS Step 7 : Variable 6 added
LARS Step 8 : Variable 7 added
Computing residuals, RSS etc .....
CV Fold 7
LASSO sequence
Computing X'X .....
LARS Step 1 : Variable 1 added
LARS Step 2 : Variable 2 added
LARS Step 3 : Variable 5 added
LARS Step 4 : Variable 8 added
LARS Step 5 : Variable 4 added
LARS Step 6 : Variable 6 added
LARS Step 7 : Variable 3 added
LARS Step 8 : Variable 7 added
Computing residuals, RSS etc .....
CV Fold 8
LASSO sequence
Computing X'X .....
LARS Step 1 : Variable 1 added
LARS Step 2 : Variable 2 added
LARS Step 3 : Variable 5 added
LARS Step 4 : Variable 8 added
LARS Step 5 : Variable 4 added
LARS Step 6 : Variable 3 added
LARS Step 7 : Variable 6 added
LARS Step 8 : Variable 7 added
Computing residuals, RSS etc .....
CV Fold 9
LASSO sequence
Computing X'X .....
LARS Step 1 : Variable 1 added
LARS Step 2 : Variable 2 added
LARS Step 3 : Variable 5 added
LARS Step 4 : Variable 4 added
LARS Step 5 : Variable 8 added
LARS Step 6 : Variable 3 added
LARS Step 7 : Variable 6 added
LARS Step 8 : Variable 7 added
Computing residuals, RSS etc .....
CV Fold 10
[1] 0.6029568 0.5935269
Call:
glm(formula = lpsa ~ ., data = train)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.64870 -0.34147 -0.05424 0.44941 1.48675
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.429170 1.553588 0.276 0.78334
lcavol 0.576543 0.107438 5.366 1.47e-06 ***
lweight 0.614020 0.223216 2.751 0.00792 **
age -0.019001 0.013612 -1.396 0.16806
lbph 0.144848 0.070457 2.056 0.04431 *
svi 0.737209 0.298555 2.469 0.01651 *
lcp -0.206324 0.110516 -1.867 0.06697 .
gleason -0.029503 0.201136 -0.147 0.88389
pgg45 0.009465 0.005447 1.738 0.08755 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 0.5073515)
Null deviance: 96.281 on 66 degrees of freedom
Residual deviance: 29.426 on 58 degrees of freedom
AIC: 155.01
Number of Fisher Scoring iterations: 2
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.