R/data.R

#' Bodyfat
#'
#' The response (`y`) corresponds to
#' estimates of percentage of body fat from application of
#' Siri's 1956 equation to measurements of underwater weighing, as well as
#' age, weight, height, and a variety of
#' body circumference measurements.
#'
#' @format A list with two items representing 252 observations from
#'   14 variables
#' \describe{
#'   \item{age}{age (years)}
#'   \item{weight}{weight (lbs)}
#'   \item{height}{height (inches)}
#'   \item{neck}{neck circumference (cm)}
#'   \item{chest}{chest circumference (cm)}
#'   \item{abdomen}{abdomen circumference (cm)}
#'   \item{hip}{hip circumference (cm)}
#'   \item{thigh}{thigh circumference (cm)}
#'   \item{knee}{knee circumference (cm)}
#'   \item{ankle}{ankle circumference (cm)}
#'   \item{biceps}{biceps circumference (cm)}
#'   \item{forearm}{forearm circumference (cm)}
#'   \item{wrist}{wrist circumference (cm)}
#' }
#' @source http://lib.stat.cmu.edu/datasets/bodyfat
#' @source https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html
"bodyfat"

#' Abalone
#'
#' This data set contains observations of abalones, the common
#' name for any of a group of sea snails. The goal is to predict the
#' age of an individual abalone given physical measurements such as
#' sex, weight, and height.
#'
#' Only a stratified sample of 211 rows of the original data set are used here.
#'
#' @format A list with two items representing 211 observations from
#'   9 variables
#' \describe{
#'   \item{sex}{sex of abalone, 1 for female}
#'   \item{infant}{indicates that the person is an infant}
#'   \item{length}{longest shell measurement in mm}
#'   \item{diameter}{perpendicular to length in mm}
#'   \item{height}{height in mm including meat in shell}
#'   \item{weight_whole}{weight of entire abalone}
#'   \item{weight_shucked}{weight of meat}
#'   \item{weight_viscera}{weight of viscera}
#'   \item{weight_shell}{weight of shell}
#'   \item{rings}{rings. +1.5 gives the age in years}
#' }
#' @source Pace, R. Kelley and Ronald Barry, Sparse Spatial Autoregressions,
#'   Statistics and Probability Letters, 33 (1997) 291-297.
"abalone"

#' Heart disease
#'
#' Diagnostic attributes of patients classified as having heart disease or not.
#'
#' @section Preprocessing:
#' The original dataset contained 13 variables. The nominal of these were
#' dummycoded, removing the first category. No precise information regarding
#' variables `chest_pain`, `thal` and `ecg` could be found, which explains
#' their obscure definitions here.
#'
#' @format 270 observations from 17 variables represented as a list consisting
#' of a binary factor response vector `y`,
#' with levels 'absence' and 'presence' indicating the absence or presence of
#' heart disease and `x`: a sparse feature matrix of class 'dgCMatrix' with the
#' following variables:
#' \describe{
#'   \item{age}{age}
#'   \item{bp}{diastolic blood pressure}
#'   \item{chol}{serum cholesterol in mg/dl}
#'   \item{hr}{maximum heart rate achieved}
#'   \item{old_peak}{ST depression induced by exercise relative to rest}
#'   \item{vessels}{the number of major blood vessels (0 to 3) that were
#'                  colored by fluoroscopy}
#'   \item{sex}{sex of the participant: 0 for male, 1 for female}
#'   \item{angina}{a dummy variable indicating whether the person suffered
#'                 angina-pectoris during exercise}
#'   \item{glucose_high}{indicates a fasting blood sugar over 120 mg/dl}
#'   \item{cp_typical}{typical angina}
#'   \item{cp_atypical}{atypical angina}
#'   \item{cp_nonanginal}{non-anginal pain}
#'   \item{ecg_abnormal}{indicates a ST-T wave abnormality
#'                       (T wave inversions and/or ST elevation or depression of
#'                       > 0.05 mV)}
#'   \item{ecg_estes}{probable or definite left ventricular hypertrophy by
#'                    Estes' criteria}
#'   \item{slope_flat}{a flat ST curve during peak exercise}
#'   \item{slope_downsloping}{a downwards-sloping ST curve during peak exercise}
#'   \item{thal_reversible}{reversible defect}
#'   \item{thal_fixed}{fixed defect}
#' }
#' @source Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository
#'   <http://archive.ics.uci.edu/ml>. Irvine, CA: University of California,
#'   School of Information and Computer Science.
#' @source <https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html#heart>
"heart"

#' Wine cultivars
#'
#' A data set of results from chemical analysis of wines grown in Italy
#' from three different cultivars.
#'
#' @format 178 observations from 13 variables represented as a list consisting
#' of a categorical response vector `y`
#' with three levels: *A*, *B*, and *C* representing different
#' cultivars of wine as well as `x`: a sparse feature matrix of class
#' 'dgCMatrix' with the following variables:
#' \describe{
#'   \item{alcohol}{alcoholic content}
#'   \item{malic}{malic acid}
#'   \item{ash}{ash}
#'   \item{alcalinity}{alcalinity of ash}
#'   \item{magnesium}{magnemium}
#'   \item{phenols}{total phenols}
#'   \item{flavanoids}{flavanoids}
#'   \item{nonflavanoids}{nonflavanoid phenols}
#'   \item{proanthocyanins}{proanthocyanins}
#'   \item{color}{color intensity}
#'   \item{hue}{hue}
#'   \item{dilution}{OD280/OD315 of diluted wines}
#'   \item{proline}{proline}
#' }
#'
#' @source Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository
#'   <http://archive.ics.uci.edu/ml>. Irvine, CA: University of California,
#'   School of Information and Computer Science.
#' @source <https://raw.githubusercontent.com/hadley/rminds/master/1-data/wine.csv>
#' @source <https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass.html#wine>
"wine"

#' Student performance
#'
#' A data set of the attributes of 382 students in secondary education
#' collected from two schools. The goal is to predict the
#' grade in math and Portugese at the end of the third period. See the
#' cited sources for additional information.
#'
#' @section Preprocessing:
#' All of the grade-specific predictors were dropped from the data set.
#' (Note that it is not clear from the source why some of these predictors are
#' specific to each grade, such as which parent is the student's guardian.)
#' The categorical variables were dummy-coded. Only the final grades (G3)
#' were kept as dependent variables, whilst the
#' first and second period grades were dropped.
#'
#' @format 382 observations from 13 variables represented as a list consisting
#' of a binary factor response matrix `y` with two responses: `portugese` and
#' `math` for the final scores in period three for the respective subjects.
#' The list also contains `x`: a sparse feature matrix of class
#' 'dgCMatrix' with the following variables:
#' \describe{
#'   \item{school_ms}{student's primary school, 1 for Mousinho da Silveira and 0
#'         for Gabriel Pereira}
#'   \item{sex}{sex of student, 1 for male}
#'   \item{age}{age of student}
#'   \item{urban}{urban (1) or rural (0) home address}
#'   \item{large_family}{whether the family size is larger than 3}
#'   \item{cohabitation}{whether parents live together}
#'   \item{Medu}{mother's level of education (ordered)}
#'   \item{Fedu}{fathers's level of education (ordered)}
#'   \item{Mjob_health}{whether the mother was employed in health care}
#'   \item{Mjob_other}{whether the mother was employed as something other than
#'         the specified job roles}
#'   \item{Mjob_services}{whether the mother was employed in the service sector}
#'   \item{Mjob_teacher}{whether the mother was employed as a teacher}
#'   \item{Fjob_health}{whether the father was employed in health care}
#'   \item{Fjob_other}{whether the father was employed as something other than
#'         the specified job roles}
#'   \item{Fjob_services}{whether the father was employed in the service sector}
#'   \item{Fjob_teacher}{whether the father was employed as a teacher}
#'   \item{reason_home}{school chosen for being close to home}
#'   \item{reason_other}{school chosen for another reason}
#'   \item{reason_rep}{school chosen for its reputation}
#'   \item{nursery}{whether the student attended nursery school}
#'   \item{internet}{Pwhether the student has internet access at home}
#' }
#'
#' @source P. Cortez and A. Silva. Using Data Mining to Predict Secondary School
#'   Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th
#'   FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto,
#'   Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7.
#'   <http://www3.dsi.uminho.pt/pcortez/student.pdf>
#' @source Dua, D. and Karra Taniskidou, E. (2017). UCI Machine Learning Repository
#'  <http://archive.ics.uci.edu/ml>. Irvine, CA: University of California,
#'  School of Information and Computer Science.
"student"

Try the owl package in your browser

Any scripts or data that you put into this service are public.

owl documentation built on Feb. 11, 2020, 5:09 p.m.