Nothing
#' Acetylene Data
#'
#' The data consist of measures of \code{yield} of a chemical manufacturing
#' process for acetylene in relation to numeric parameters.
#'
#' Marquardt and Snee (1975) used these data to illustrate ridge regression in
#' a model containing quadratic and interaction terms, particularly the need to
#' center and standardize variables appearing in high-order terms.
#'
#' Typical models for these data include the interaction of \code{temp:ratio},
#' and a squared term in \code{temp}
#'
#' @name Acetylene
#' @docType data
#' @format A data frame with 16 observations on the following 4 variables.
#' \describe{
#' \item{\code{yield}}{conversion percentage yield of acetylene}
#' \item{\code{temp}}{reactor temperature (celsius)}
#' \item{\code{ratio}}{H2 to N-heptone ratio}
#' \item{\code{time}}{contact time (sec)}
#' }
#' @references
#' Marquardt, D.W., and Snee, R.D. (1975), "Ridge Regression in
#' Practice," \emph{The American Statistician}, \bold{29}, 3-20.
#'
#' Marquardt, D.W. (1980), "A Critique of Some Ridge Regression Methods:
#' Comment," \emph{Journal of the American Statistical Association}, Vol. 75,
#' No. 369 (Mar., 1980), pp. 87-91
#'
#' @source SAS documentation example for \code{PROC REG}, \emph{Ridge
#' Regression for Acetylene Data}.
#' @keywords datasets
#' @examples
#'
#' data(Acetylene)
#'
#' # naive model, not using centering
#' amod0 <- lm(yield ~ temp + ratio + time + I(time^2) + temp:time, data=Acetylene)
#'
#' y <- Acetylene[,"yield"]
#' X0 <- model.matrix(amod0)[,-1]
#'
#' lambda <- c(0, 0.0005, 0.001, 0.002, 0.005, 0.01)
#' aridge0 <- ridge(y, X0, lambda=lambda)
#'
#' traceplot(aridge0)
#' traceplot(aridge0, X="df")
#' pairs(aridge0, radius=0.2)
#'
#'
#'
NULL
#' Detroit Homicide Data for 1961-1973
#'
#' @description
#' The data set \code{Detroit} was used extensively in the book by Miller
#' (2002) on subset regression. The data are unusual in that a subset of three
#' predictors can be found which gives a very much better fit to the data than
#' the subsets found from the Efroymson stepwise algorithm, or from forward
#' selection or backward elimination. They are also unusual in that, as time
#' series data, the assumption of independence is patently violated, and the
#' data suffer from problems of high collinearity.
#'
#' As well, ridge regression reveals somewhat paradoxical paths of shrinkage in
#' univariate ridge trace plots, that are more comprehensible in multivariate
#' views.
#'
#' @details
#' The data were originally collected and discussed by Fisher (1976) but the
#' complete dataset first appeared in Gunst and Mason (1980, Appendix A).
#' Miller (2002) discusses this dataset throughout his book, but doesn't state
#' clearly which variables he used as predictors and which is the dependent
#' variable. (\code{Homicide} was the dependent variable, and the predictors
#' were \code{Police} \dots{} \code{WkEarn}.) The data were obtained from
#' StatLib.
#'
#' A similar version of this data set, with different variable names appears in
#' the \code{bestglm} package.
#'
#' @name Detroit
#' @docType data
#' @format A data frame with 13 observations on the following 14 variables.
#' \describe{
#' \item{\code{Police}}{Full-time police per 100,000 population}
#' \item{\code{Unemp}}{Percent unemployed in the population}
#' \item{\code{MfgWrk}}{Number of manufacturing workers in thousands}
#' \item{\code{GunLic}}{Number of handgun licences per 100,000 population}
#' \item{\code{GunReg}}{Number of handgun registrations per 100,000 population}
#' \item{\code{HClear}}{Percent of homicides cleared by arrests}
#' \item{\code{WhMale}}{Number of white males in the population}
#' \item{\code{NmfgWrk}}{Number of non-manufacturing workers in thousands}
#' \item{\code{GovWrk}}{Number of government workers in thousands}
#' \item{\code{HrEarn}}{Average hourly earnings}
#' \item{\code{WkEarn}}{Average weekly earnings}
#' \item{\code{Accident}}{Death rate in accidents per 100,000 population}
#' \item{\code{Assaults}}{Number of assaults per 100,000 population}
#' \item{\code{Homicide}}{Number of homicides per 100,000 of population}
#' }
#'
#' @references
#' Fisher, J.C. (1976). Homicide in Detroit: The Role of Firearms.
#' \emph{Criminology}, \bold{14}, 387--400.
#'
#' Gunst, R.F. and Mason, R.L. (1980). \emph{Regression analysis and its
#' application: A data-oriented approach}. Marcel Dekker.
#'
#' Miller, A. J. (2002). \emph{Subset Selection in Regression}. 2nd Ed. Chapman
#' & Hall/CRC. Boca Raton.
#'
#' @source \url{http://lib.stat.cmu.edu/datasets/detroit}
#' @keywords datasets
#' @examples
#'
#' data(Detroit)
#'
#' # Work with a subset of predictors, from Miller (2002, Table 3.14),
#' # the "best" 6 variable model
#' # Variables: Police, Unemp, GunLic, HClear, WhMale, WkEarn
#' # Scale these for comparison with other methods
#'
#' Det <- as.data.frame(scale(Detroit[,c(1,2,4,6,7,11)]))
#' Det <- cbind(Det, Homicide=Detroit[,"Homicide"])
#'
#' # use the formula interface; specify ridge constants in terms
#' # of equivalent degrees of freedom
#' dridge <- ridge(Homicide ~ ., data=Det, df=seq(6,4,-.5))
#'
#' # univariate trace plots are seemingly paradoxical in that
#' # some coefficients "shrink" *away* from 0
#' traceplot(dridge, X="df")
#' vif(dridge)
#' pairs(dridge, radius=0.5)
#'
#' \donttest{
#' plot3d(dridge, radius=0.5, labels=dridge$df)
#'
#' # transform to PCA/SVD space
#' dpridge <- pca(dridge)
#'
#' # not so paradoxical in PCA space
#' traceplot(dpridge, X="df")
#' biplot(dpridge, radius=0.5, labels=dpridge$df)
#'
#' # show PCA vectors in variable space
#' biplot(dridge, radius=0.5, labels=dridge$df)
#' }
#'
#'
NULL
#' Hospital manpower data
#'
#' @description
#' The hospital manpower data, taken from Myers (1990), table 3.8, are a
#' well-known example of highly collinear data to which ridge regression and
#' various shrinkage and selection methods are often applied.
#'
#' The data consist of measures taken at 17 U.S. Naval Hospitals and the goal
#' is to predict the required monthly man hours for staffing purposes.
#'
#' @details
#' Myers (1990) indicates his source was "Procedures and Analysis for Staffing
#' Standards Development: Data/Regression Analysis Handbook", Navy Manpower and
#' Material Analysis Center, San Diego, 1979.
#'
#' @name Manpower
#' @docType data
#' @format A data frame with 17 observations on the following 6 variables.
#' \describe{
#' \item{\code{Hours}}{monthly man hours (response variable)}
#' \item{\code{Load}}{average daily patient load}
#' \item{\code{Xray}}{monthly X-ray exposures}
#' \item{\code{BedDays}}{monthly occupied bed days}
#' \item{\code{AreaPop}}{eligible population in the area in thousands}
#' \item{\code{Stay}}{average length of patient's stay in days}
#' }
#' @seealso \code{\link[bestglm]{manpower}} for the same data, and other
#' analyses
#'
#' @references
#' Donald R. Jensen and Donald E. Ramirez (2012). Variations on
#' Ridge Traces in Regression, \emph{Communications in Statistics - Simulation
#' and Computation}, 41 (2), 265-278.
#'
#' @source
#' Raymond H. Myers (1990). \emph{Classical and Modern Regression with
#' Applications}, 2nd ed., PWS-Kent, pp. 130-133.
#'
#' @keywords datasets
#' @examples
#'
#' data(Manpower)
#' mmod <- lm(Hours ~ ., data=Manpower)
#' vif(mmod)
#' # ridge regression models, specified in terms of equivalent df
#' mridge <- ridge(Hours ~ ., data=Manpower, df=seq(5, 3.75, -.25))
#' vif(mridge)
#'
#' # univariate ridge trace plots
#' traceplot(mridge)
#' traceplot(mridge, X="df")
#'
#' # bivariate ridge trace plots
#' plot(mridge, radius=0.25, labels=mridge$df)
#' pairs(mridge, radius=0.25)
#'
#' \donttest{
#' # 3D views
#' # ellipsoids for Load, Xray & BedDays are nearly 2D
#' plot3d(mridge, radius=0.2, labels=mridge$df)
#' # variables in model selected by AIC & BIC
#' plot3d(mridge, variables=c(2,3,5), radius=0.2, labels=mridge$df)
#'
#' # plots in PCA/SVD space
#' mpridge <- pca(mridge)
#' traceplot(mpridge, X="df")
#' biplot(mpridge, radius=0.25)
#' }
#'
#'
NULL
#' Prostate Cancer Data
#'
#' @description
#' Data to examine the correlation between the level of prostate-specific
#' antigen and a number of clinical measures in men who were about to receive a
#' radical prostatectomy.
#'
#' @details
#' This data set came originally from the (now defunct) ElemStatLearn package.
#'
#' The last column indicates which 67 observations were used as the "training
#' set" and which 30 as the test set, as described on page 48 in the book.
#'
#' @name prostate
#' @docType data
#' @format A data frame with 97 observations on the following 10 variables.
#' \describe{
#' \item{lcavol}{log cancer volume}
#' \item{lweight}{log prostate weight}
#' \item{age}{in years}
#' \item{lbph}{log of the amount of benign prostatic hyperplasia}
#' \item{svi}{seminal vesicle invasion}
#' \item{lcp}{log of capsular penetration}
#' \item{gleason}{a numeric vector}
#' \item{pgg45}{percent of Gleason score 4 or 5}
#' \item{lpsa}{response}
#' \item{train}{a logical vector}
#' }
#' @note There was an error in this dataset in earlier versions of the package,
#' as indicated in a footnote on page 3 of the second edition of the book. As
#' of version 2012.04-0 this was corrected.
#'
#' @source
#' Stamey, T., Kabalin, J., McNeal, J., Johnstone, I., Freiha, F.,
#' Redwine, E. and Yang, N (1989) Prostate specific antigen in the diagnosis
#' and treatment of adenocarcinoma of the prostate II. Radical prostatectomy
#' treated patients, \emph{Journal of Urology}, \bold{16}: 1076--1083.
#' @keywords datasets
#' @examples
#'
#' data(prostate)
#' str( prostate )
#' cor( prostate[,1:8] )
#' prostate <- prostate[, -10]
#'
#' prostate.mod <- lm(lpsa ~ ., data=prostate)
#' vif(prostate.mod)
#'
#' py <- prostate[, "lpsa"]
#' pX <- data.matrix(prostate[, 1:8])
#' pridge <- ridge(py, pX, df=8:1)
#' pridge
#'
#' plot(pridge)
#' pairs(pridge)
#' traceplot(pridge)
#' traceplot(pridge, X="df")
#'
#'
#'
NULL
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.