Each record represents follow-up data for one breast cancer case. These are consecutive patients seen by Dr. Wolberg since 1984, and include only those cases exhibiting invasive breast cancer and no evidence of distant metastases at the time of diagnosis.

1 | ```
data("wpbc")
``` |

A data frame with 198 observations on the following 34 variables.

`status`

a factor with levels

`N`

(nonrecur) and`R`

(recur)`time`

recurrence time (for

`status == "R"`

) or disease-free time (for`status == "N"`

).`mean_radius`

radius (mean of distances from center to points on the perimeter) (mean).

`mean_texture`

texture (standard deviation of gray-scale values) (mean).

`mean_perimeter`

perimeter (mean).

`mean_area`

area (mean).

`mean_smoothness`

smoothness (local variation in radius lengths) (mean).

`mean_compactness`

compactness (mean).

`mean_concavity`

concavity (severity of concave portions of the contour) (mean).

`mean_concavepoints`

concave points (number of concave portions of the contour) (mean).

`mean_symmetry`

symmetry (mean).

`mean_fractaldim`

fractal dimension (mean).

`SE_radius`

radius (mean of distances from center to points on the perimeter) (SE).

`SE_texture`

texture (standard deviation of gray-scale values) (SE).

`SE_perimeter`

perimeter (SE).

`SE_area`

area (SE).

`SE_smoothness`

smoothness (local variation in radius lengths) (SE).

`SE_compactness`

compactness (SE).

`SE_concavity`

concavity (severity of concave portions of the contour) (SE).

`SE_concavepoints`

concave points (number of concave portions of the contour) (SE).

`SE_symmetry`

symmetry (SE).

`SE_fractaldim`

fractal dimension (SE).

`worst_radius`

radius (mean of distances from center to points on the perimeter) (worst).

`worst_texture`

texture (standard deviation of gray-scale values) (worst).

`worst_perimeter`

perimeter (worst).

`worst_area`

area (worst).

`worst_smoothness`

smoothness (local variation in radius lengths) (worst).

`worst_compactness`

compactness (worst).

`worst_concavity`

concavity (severity of concave portions of the contour) (worst).

`worst_concavepoints`

concave points (number of concave portions of the contour) (worst).

`worst_symmetry`

symmetry (worst).

`worst_fractaldim`

fractal dimension (worst).

`tsize`

diameter of the excised tumor in centimeters.

`pnodes`

number of positive axillary lymph nodes observed at time of surgery.

The first 30 features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.

There are two possible learning problems: predicting `status`

or predicting
the time to recur.

1) Predicting field 2, outcome: R = recurrent, N = non-recurrent - Dataset should first be filtered to reflect a particular endpoint; e.g., recurrences before 24 months = positive, non-recurrence beyond 24 months = negative. - 86.3 previous version of this data.

2) Predicting Time To Recur (field 3 in recurrent records) - Estimated mean error 13.9 months using Recurrence Surface Approximation.

The data are originally available from the UCI machine learning repository, see http://www.ics.uci.edu/~mlearn/databases/breast-cancer-wisconsin/.

W. Nick Street, Olvi L. Mangasarian and William H. Wolberg (1995).
An inductive learning approach to prognostic prediction.
In A. Prieditis and S. Russell, editors, *Proceedings of the
Twelfth International Conference on Machine Learning*, pages
522–530, San Francisco, Morgan Kaufmann.

Peter Buehlmann and Torsten Hothorn (2007),
Boosting algorithms: regularization, prediction and model fitting.
*Statistical Science*, **22**(4), 477–505.

1 2 3 4 5 |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.