breast_cancer_clean_features: Wisconsin Breast Cancer Database

breast_cancer_clean_featuresR Documentation

Wisconsin Breast Cancer Database

Description

Wisconsin Breast Cancer Database

Usage

breast_cancer_clean_features

Format

A list containing a training and test dataset. These come from a data frame with 699 observations on 11 variables, however the ID and class columns have been removed. There is a train to test ratio of 0.8.

Cl.thickness

Clump Thickness

Cell.size

Uniformity of Cell Size

Cell.shape

Uniformity of Cell Shape

Marg.adhesion

Marginal Adhesion

Epith.c.size

Single Epithelial Cell Size

Bare.nuclei

Bare Nuclei

Bl.cromatin

Bland Chromatin

Normal.nucleoli

Normal Nucleoli

Mitoses

Mitoses

Source

  • Creator: Dr. WIlliam H. Wolberg (physician); University of Wisconsin Hospital ;Madison; Wisconsin; USA

  • Donor: Olvi Mangasarian (mangasarian@cs.wisc.edu)

  • Received: David W. Aha (aha@cs.jhu.edu)

These data have been taken from the UCI Repository Of Machine Learning Databases at

and were converted to R format by Evgenia Dimitriadou.

References

1. Wolberg,W.H., \& Mangasarian,O.L. (1990). Multisurface method of pattern separation for medical diagnosis applied to breast cytology. In Proceedings of the National Academy of Sciences, 87, 9193-9196.
- Size of data set: only 369 instances (at that point in time)
- Collected classification results: 1 trial only
- Two pairs of parallel hyperplanes were found to be consistent with 50% of the data
- Accuracy on remaining 50% of dataset: 93.5%
- Three pairs of parallel hyperplanes were found to be consistent with 67% of data
- Accuracy on remaining 33% of dataset: 95.9%

2. Zhang,J. (1992). Selecting typical instances in instance-based learning. In Proceedings of the Ninth International Machine Learning Conference (pp. 470-479). Aberdeen, Scotland: Morgan Kaufmann.
- Size of data set: only 369 instances (at that point in time)
- Applied 4 instance-based learning algorithms
- Collected classification results averaged over 10 trials
- Best accuracy result:
- 1-nearest neighbor: 93.7%
- trained on 200 instances, tested on the other 169
- Also of interest:
- Using only typical instances: 92.2% (storing only 23.1 instances)
- trained on 200 instances, tested on the other 169

Newman, D.J. & Hettich, S. & Blake, C.L. & Merz, C.J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.


ascentTraining documentation built on April 27, 2022, 9:06 a.m.