PimaIndiansDiabetes: Pima Indians Diabetes Database

PimaIndiansDiabetesR Documentation

Pima Indians Diabetes Database

Description

A data frame with 768 observations on 9 variables.

Usage

data("PimaIndiansDiabetes", package = "mlbench")
data("PimaIndiansDiabetes2", package = "mlbench")

Format

pregnant Number of times pregnant
glucose Plasma glucose concentration (glucose tolerance test)
pressure Diastolic blood pressure (mm Hg)
triceps Triceps skin fold thickness (mm)
insulin 2-Hour serum insulin (mu U/ml)
mass Body mass index
pedigree Diabetes pedigree function
age Age (years)
diabetes Class variable (test for diabetes)

Details

The data set PimaIndiansDiabetes2 contains a corrected version of the original data set. While the UCI repository index claims that there are no missing values, closer inspection of the data shows several physical impossibilities, e.g., blood pressure or body mass index of 0. In PimaIndiansDiabetes2, all zero values of glucose, pressure, triceps, insulin and mass have been set to NA, see also \bibcitetWahba+Gu+Wang:1995 and \bibcitetRipley:1996.

Source

  • Original owners: National Institute of Diabetes and Digestive and Kidney Diseases

  • Donor of database: Vincent Sigillito (vgs@aplcen.apl.jhu.edu)

These data have been taken from the UCI Repository Of Machine Learning Databases \bibcitepBlake+Merz:1998 and were converted to R format by Friedrich Leisch in the late 1990s.

The data no longer seems to be available from the UC Irvine Machine Learning Repository (now at https://archive.ics.uci.edu/).

References

\bibshow

Blake+Merz:1998, Ripley:1996, Wahba+Gu+Wang:1995

Examples

data("PimaIndiansDiabetes", package = "mlbench")
summary(PimaIndiansDiabetes)

data("PimaIndiansDiabetes2", package = "mlbench")
summary(PimaIndiansDiabetes2)

mlbench documentation built on March 26, 2026, 5:09 p.m.