List of data sets:
birthwt
demo.dat
demo.dat2
demo.dat3
demo.dat4
diabetes
diabetes2
heart
prostate
wines
Details:
birthwt is a list containing group IDs, the model matrix, and
outcome variable, in a format to be used with the group regression
functions. The data consists of 189 observations of 16 predictor variables,
and and the response variable of birthweight. All numeric variables are
standardized. The data were collected at Baystate Medical Center,
Springfield, Mass during 1986. This was created from the data set of the
same name in the MASS package.
demo.dat, demo.dat2,
demo.dat3, and demo.dat4 are data sets generated by me for testing the
models in the package. The first and second data sets contain thirty
observations of 15 predictor variables and one outcome variable "y", with
high degrees of collinearity. This is intended to be used to evaluate
regularized regression estimators. The third and fourth data sets contain 60
observations of 11 predictor variables and 1 outcome variable "y".
Collinearity is fairly weak in these. The first and third data sets have the
outcome variable generated from a sparse set of predictors, such that some
of the effects are truly exactly zero. The second and fourth data set
outcome variables are generated from nonsparse vectors of true
coefficients, such that some effects are small but none are truly exactly
zero. For any of these four data sets the true coefficients used to generate
the data can be viewed with attr(demo.dat, "true.betas") to assess how
accurate a regression model is for these data.
diabetes is a data frame containing standardized variables from the
diabetes data set in Efron (2004). Contains 11 covariates, one of which is a
factor (gender).
diabetes2 is a data frame containing
standardized variables from the diabetes data set in Schorling et al (1997).
The data consists of 403 observations of 16 variables from subjects
interviewed in a study to understand the prevalence of obesity, diabetes,
and other cardiovascular risk factors in central Virginia for African
Americans. Some missing values were imputed by myself using the mice
package, and three longitudinal variables were ommitted due to an excessive
numer of missing values. This was done so that the data set is easily used
for didactic purposes. The original data set can be downloaded from
http://biostat.mc.vanderbilt.edu/wiki/Main/DataSets.
heart is a data frame containing standardized variables from a
sample of males in a heartdisease study conducted in Western Cape, South
Africa. There are 462 observations of 10 variables. These data are taken
from a larger dataset from Rossouw et al, (1983). The intended outcome
variable "chd" is binary, making this useful for logistic regression
examples.
prostate is a data frame containing
standardized variables from a study (Stamey et al., 1989) of the
relationship between the level of prostatespecific antigen and a number of
clinical measures in men who were about to receive a radical prostatectomy.
There are 97 observations of 9 variables.
wines is a
data frame containing standardized variables from Aeberhard et al. (1992).
There are 178 observations 14 variables, one of which is a factor variable
with three levels. The data set in its original form can be downloaded from
https://archive.ics.uci.edu/ml/datasets/wine.
An object of class data.frame
with 178 rows and 14 columns.
1  data("wines")

