SAheart: South African Heart Disease Data

Description Format Details Author(s) Source References

Description

From the web source: "A retrospective sample of males in a heart-disease high-risk region of the Western Cape, South Africa. There are roughly two controls per case of CHD. Many of the CHD positive men have undergone blood pressure reduction treatment and other programs to reduce their risk factors after their CHD event. In some cases the measurements were made after these treatments."

The data are packaged here from the source (below). With one significant change (making chd a factor) they are also a repackaging of the data of the same name from the now archived (in 2020) of the 2015 'ElemStatLearn' package of Kjetil B. Halvorsen.

Format

A data frame with 462 rows and 10 variables

sbp

Systolic blood pressure in millimetres of mercury (mm Hg).

tobacco

Cumulative tobacco use in kilograms. Appears to be lifetime cumulative; not annual.

ldl

Low density lipoprotein cholesterol.

adiposity

Not recorded in source; presumably another measurement of obesity similar to BMI. Possibly a "corrected" version of obesity measure.

famhist

Factor indicating presence or absence of a family history of ischaemic heart disease.

typea

Type-A coronary prone personality behaviour as measured by a self-administered Bortner Short Rating Scale. Possible total scores can range from 12 to 84. Rossouw et al. (1983) "arbitrarily" classify those with scores of 55 or more "as exhibiting type A behaviour."

obesity

A measure of obesity; body mass index (or BMI) is consistent with Rossouw et al. (1983). Having BMI >= 30 scored as "obese" by Rossouw et al. (1983).

alcohol

Current alcohol consumption. Units of measurement (quantity/time) are unclear (e.g litres per annum, ounces per month?); alcohol not mentioned in Rossouw et al. (1983).

age

Age in years at time of study (Source web page: ‘Age at onset’.)

chd

The response, a factor identifying whether the subject had been diagnosed as having coronary heart disease or not.

The row order of the values follow their order of appearence in the source webpage.

Details

In the late 1970s, an unusually high incidence of ischaemic heart disease had been observed to exist amongst white Afrikaans-speaking segments of South African society (Wyndham, 1982). Using an intensive postal campaign in 1979, Rossouw et al. (1983) recruited about 82 known target population of inhabitants of three Afrikaner communities in the southwestern Cape Province (3,357 white males and 3,831 white females).

For each subject, the binary response "chd" (originally appearing in the original file as 1 if they had coronary heart disease and 0 otherwise; but now as "Yes" or "No") was determined in the survey together with a variety of known risk factors for heart disease.

The goal was to explore the prevalence and intensity of chd risk factors in these high incidence communities with particular attention to those major risk factors (e.g. hypercholestrolaemia, hypertension, and smoking) which might be considered reversible (Rossouw et al., 1983).

Hastie and Tibshirani (1987) selected a subset of 465 subjects from the 3,357 white males (in these communities, male mortality rates were about two and a half times that of the females; see Rossouw et al., 1983). The 465 subjects consisted of all 162 cases having had coronary heart disease as well as 303 controls sampled from the remaining set of survey subjects.

The same (or similar) data seems to be used again for illustration in Hastie, Tibshirani, and Friedman (2009) and it is that which is now ported here from the book's accompanying website (see source). Curiously, this data set (viz. that recorded here) contains values on only 462 subjects, of which now only 160 are cases and 302 are controls.

In the current data set, rows 1-261 have row numbers matching the source "row.name", thereafter the row number is one less than the source "row.name". It would appear that subject with "row.name" 262 is absent from the source (below) and, speculatively, perhaps also those whose "row.name" could have been 464 and 465.

See references, particularly Rossouw et al (1983), for more details.

Author(s)

R.W. Oldford

Source

Trevor Hastie's "Elements of Statistical Learning" page at Stanford.

References

Trevor Hastie and Robert Tibshirani (1987) "Non-parametric logistic and proportional odds regression", JRSS-C (Applied Statistics), 36(3), 260–276.

Trevor Hastie, Robert Tibshirani, and Jerome Friedman (2009) "The Elements of Statistical Learning", 2nd Edition, Springer New York <doi:10.1007/978-0-387-84858-7>

J.E. Rossouw, J.P.D. Plessis, A.J.S. Benad\'e, P.C.J. Jordaan, J.P. Kotz\'e, P.L. Jooste, and J.J. Ferreira (1983) "Coronary risk factor screening in three rural communities: The CORIS baseline study". South African Medical Journal, 64, 430-436.

C. Wyndham (1982) "Trends with time of cardiovascular mortality rates in the populations of the RSA for the period 1968-1977", South African Medical Journal, 61, 987-993.


rwoldford/loon.data documentation built on May 17, 2021, 7:45 a.m.