diabetes: Pima Indians Diabetes Data Set

Description Format Details Source References Examples


From National Institute of Diabetes and Digestive and Kidney Diseases.


X is a data frame of 768 female patients with 8 attributes.

no.pregnant number of pregnancies.
glucose plasma glucose concentration in an oral glucose tolerance test
blood.press diastolic blood pressure (mm Hg)
triceps.thick triceps skin fold thickness (mm)
insulin 2-Hour serum insulin (mu U/ml)
BMI body mass index (weight in kg/(height in m)\^2)
pedigree diabetes pedigree function
age age in years

y contains the class labels: Yes or No, for diabetic according to WHO criteria.

The training set diabetes.tr contains a randomly selected set of 600 subjects, and diabetes.te contains the remaining 168 subjects. diabetes contains all 768 objects.


Several constraints were placed on the selection of these instances from a larger database. In particular, all patients here are females at least 21 years old of Pima Indian heritage.


Chih-Chung Chang and Chih-Jen Lin, LIBSVM : a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.


Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261–265). IEEE Computer Society Press.



SVMMaj documentation built on May 2, 2019, 9:58 a.m.