prostate: Prostate cancer data from Stamey et al. (1989)

prostateR Documentation

Prostate cancer data from Stamey et al. (1989)

Description

This dataset is used as an example in Hastie, Tibshirani & Freedman's Elements of Statistical Learning. It was included in the ElemStatLearn package which (at time of writing) is orphaned and no-longer available on CRAN.

Usage

prostate

Format

An object of class data.frame with 97 rows and 10 columns.

Details

There are 8 predictors (columns 1:8), one outcome (column 9) and a marker for test/train data used in the textbook examples (column 10).

Observations are from 97 men who underwent prostatectomy. The original paper investigates the post-surgical characteristics that predict pre-surgical prostate-specific antigen (PSA) score (variable: lpsa).

Variables prefixed with 'l' have been log transformed.

The following descriptions have been adapted from Ryan Tibshirani's lecture notes on EDA #nolint

lpsa:

log PSA score

lcavol:

log cancer volume

lweight:

log prostate weight

age:

age of patient

lbph:

log of the amount of benign prostatic hyperplasia

svi:

seminal vesicle invasion

lcp:

log of capsular penetration

gleason:

Gleason score

pgg45:

percent of Gleason scores 4 or 5

The dataset is provided in the original units, a scaled version can be obtained with sprostate <- data.frame(scale(prostate[,-10]), train = prostate[,10]).

Observations are ordered by outcome.


AndrewLawrence/dCVnet documentation built on Sept. 24, 2024, 5:24 a.m.