Description Usage Format Details Note Source References Examples
From the Kaggle website: This database contains 76 attributes, but all published experiments refer to using a subset of 14 of them. In particular, the Cleveland database is the only one that has been used by ML researchers to this date. The "goal" field refers to the presence of heart disease in the patient. It is integer valued from 0 (no presence) to 4. Experiments with the Cleveland database have concentrated on simply attempting to distinguish presence (values 1,2,3,4) from absence (value 0).
1 |
Data frame with 14 variables
age in years
Male or female
typical angina, atypical angina, non-anginal pain, asymptomatic
resting blood pressure (in mm Hg on admission to the hospital)
serum cholestoral in mg/dl
blood sugar > 120 mg/dl vs. less than ('lt_120', 'gt_120')
resting electrocardiographic results
maximum heart rate achieved
exercise induced angina (yes or no)
ST depression induced by exercise relative to rest. See this for explanation of ST.
the slope of the peak exercise ST segment: positive flat or negative. See this for explanation of ST.
number of major vessels (0-3) colored by flourosopy
normal, fixed defect, or reversible defect
1 = yes, 0 = no. Left as numeric.
This data is useful for standard classification/survival (if using age).
This is the classic heart disease data only prepped for actual use and with more useful names/labels where possible. For reference the original names are: age, sex, cp, trestbps, chol, fbs, restecg, thalach, exang, oldpeak, slope, ca, thal, target.
The values are sometimes labeled 1:4 while the actual data values are 0-3 and similar. Assumptions have been made that these coincide as one would expect. Thal describes values of 3, 6, 7 but was actually coded 0-3 with only two zero values. The zeros were converted to NA.
Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J., Sandhu, S., Guppy, K., Lee, S., & Froelicher, V. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology, 64,304–310.
David W. Aha & Dennis Kibler. "Instance-based prediction of heart-disease presence with the Cleveland database."
Gennari, J.H., Langley, P, & Fisher, D. (1989). Models of incremental concept formation. Artificial Intelligence, 40, 11-61.
1 2 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.