hearth2: Data on Coronary Artery Disease

hearth2R Documentation

Data on Coronary Artery Disease

Description

This data includes 294 patients undergoing angiography at the Hungarian Institute of Cardiology in Budapest between 1983 and 1987.

Format

A data frame with 294 observations, ten covariates and one two-class outcome variable

Details

The variables are as follows:

  • age. numeric. Age in years

  • sex. factor. Sex (1 = male; 0 = female)

  • chest_pain. factor. Chest pain type (1 = typical angina; 2 = atypical angina; 3 = non-anginal pain; 4 = asymptomatic)

  • trestbps. numeric. Resting blood pressure (in mm Hg on admission to the hospital)

  • chol. numeric. Serum cholestoral in mg/dl

  • fbs. factor. Fasting blood sugar > 120 mg/dl (1 = true; 0 = false)

  • restecg. factor. Resting electrocardiographic results (1 = having ST-T wave abnormality (T wave inversions and/or ST elevation or depression of > 0.05 mV); 0 = normal)

  • thalach. numeric. Maximum heart rate achieved

  • exang. factor. Exercise induced angina (1 = yes; 0 = no)

  • oldpeak. numeric. ST depression induced by exercise relative to rest

  • Class. factor. Disease satus (1 = no disease; 2 = coronary artery disease)

⁠ ⁠
The original openML dataset was pre-processed in the following way:

1. The variables were re-named according to the description given on openML.

2. The missing values which were coded as "-9" were replaced by NA values.

3. The variables slope, ca, and thal were excluded, because these featured too many missing values.

4. The categorical covariates were transformed into factors.

5. There were 6 restecg values of "2" which were replaced by "1".

6. The missing values were imputed: The missing values of the numerical covariates were replaced by the means of the corresponding non-missing values. The missing values of the categorical covariates were replaced by the modes of the corresponding non-missing values.

Note that this dataset is also included in a slightly different form in the R package ordinalForest (version 2.4-2) under the name hearth. The only difference is that in hearth2, the ordinal outcome variable Class was transformed into a two-class outcome by only differentiating between diseased vs. healthy, rather than differentiating between different levels of disease severity.

Source

OpenML: data.name: heart-h, data.id: 1565, link: https://www.openml.org/d/1565/

References

  • Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J.-J., Sandhu, S., Guppy, K. H., Lee, S., Froelicher, V. (1989) International application of a new probability algorithm for the diagnosis of coronary artery disease. The American Journal Of Cardiology, 64, 304–310.

  • Vanschoren, J., van Rijn, J. N., Bischl, B., Torgo, L. (2013) OpenML: networked science in machine learning. SIGKDD Explorations, 15(2), 49–60.

Examples

data(hearth2)

table(hearth2$Class)
dim(hearth2)

head(hearth2)


rfvimptest documentation built on June 8, 2025, 10:41 a.m.