R/birthwt.R

#' Data Sets in Bayezilla Package
#' 
#' \strong{List of data sets:} \cr \cr birthwt \cr demo.dat \cr demo.dat2 \cr
#' demo.dat3 \cr demo.dat4 \cr diabetes \cr diabetes2 \cr heart \cr prostate
#' \cr wines \cr \cr \cr \strong{Details:}\cr \cr \itemize{ \item
#' \strong{birthwt} is a list containing group IDs, the model matrix, and
#' outcome variable, in a format to be used with the group regression
#' functions. The data consists of 189 observations of 16 predictor variables,
#' and and the response variable of birthweight. All numeric variables are
#' standardized. The data were collected at Baystate Medical Center,
#' Springfield, Mass during 1986. This was created from the data set of the
#' same name in the MASS package. \cr \cr \item \strong{demo.dat, demo.dat2,
#' demo.dat3, and demo.dat4} are data sets generated by me for testing the
#' models in the package. The first and second data sets contain thirty
#' observations of 15 predictor variables and one outcome variable "y", with
#' high degrees of collinearity. This is intended to be used to evaluate
#' regularized regression estimators. The third and fourth data sets contain 60
#' observations of 11 predictor variables and 1 outcome variable "y".
#' Collinearity is fairly weak in these. The first and third data sets have the
#' outcome variable generated from a sparse set of predictors, such that some
#' of the effects are truly exactly zero. The second and fourth data set
#' outcome variables are generated from non-sparse vectors of true
#' coefficients, such that some effects are small but none are truly exactly
#' zero. For any of these four data sets the true coefficients used to generate
#' the data can be viewed with attr(demo.dat, "true.betas") to assess how
#' accurate a regression model is for these data. \cr \cr \item
#' \strong{diabetes} is a data frame containing standardized variables from the
#' diabetes data set in Efron (2004). Contains 11 covariates, one of which is a
#' factor (gender). \cr \cr \item \strong{diabetes2} is a data frame containing
#' standardized variables from the diabetes data set in Schorling et al (1997).
#' The data consists of 403 observations of 16 variables from subjects
#' interviewed in a study to understand the prevalence of obesity, diabetes,
#' and other cardiovascular risk factors in central Virginia for African
#' Americans. Some missing values were imputed by myself using the mice
#' package, and three longitudinal variables were ommitted due to an excessive
#' numer of missing values. This was done so that the data set is easily used
#' for didactic purposes. The original data set can be downloaded from
#' http://biostat.mc.vanderbilt.edu/wiki/Main/DataSets. \cr \cr \item
#' \strong{heart} is a data frame containing standardized variables from a
#' sample of males in a heart-disease study conducted in Western Cape, South
#' Africa. There are 462 observations of 10 variables. These data are taken
#' from a larger dataset from Rossouw et al, (1983). The intended outcome
#' variable "chd" is binary, making this useful for logistic regression
#' examples. \cr \cr \item \strong{prostate} is a data frame containing
#' standardized variables from a study (Stamey et al., 1989) of the
#' relationship between the level of prostate-specific antigen and a number of
#' clinical measures in men who were about to receive a radical prostatectomy.
#' There are 97 observations of 9 variables. \cr \cr \item \strong{wines} is a
#' data frame containing standardized variables from Aeberhard et al. (1992).
#' There are 178 observations 14 variables, one of which is a factor variable
#' with three levels. The data set in its original form can be downloaded from
#' https://archive.ics.uci.edu/ml/datasets/wine. \cr }
#' 
#' 
#' @references Aeberhard, S., Coomans, D., and de Vel, O. (1992) Comparison of
#' Classifiers in High Dimensional Settings, Tech. Rep. no. 92-02, Dept. of
#' Computer Science and Dept. of Mathematics and Statistics, James Cook
#' University of North Queensland.  \cr \cr Efron, B., et al. 2004. Least angle
#' regression (with discussion). Ann. Statist. 32:407-499.
#' do:10.1214/009053604000000067 \cr \cr Schorling, J.B., Roach, J., Siegel,
#' M., Baturka, N., Hunt, D.E., Guterbock, T.M., and Stewart, H.L. (1997) A
#' trial of church-based smoking cessation interventions for rural African
#' Americans. Preventive Medicine 26:92-101. \cr \cr Rossouw, J.E., Du Plessis,
#' J.P., Benade, A.J., Jordaan, P.C., Kotze, J.P., Jooste, P.L., and Ferreira.
#' J.J. (1983). Coronary risk factor screening in three rural communities. The
#' CORIS baseline study. S Afr Med J.: 64(12):430-6. \cr \cr Stamey, T.,
#' Kabalin, J., McNeal, J., Johnstone, I., Freiha, F., Redwine, E. and Yang, N.
#' (1989) Prostate specific antigen in the diagnosis and treatment of
#' adenocarcinoma of the prostate II. Radical prostatectomy treated patients,
#' Journall of Urology 16: 1076 - 1083. \cr \cr
#' @examples
#' 
#' data(birthwt)
#' 
#' @keywords datasets
"birthwt"
abnormally-distributed/Bayezilla documentation built on Oct. 31, 2019, 1:57 a.m.