R/data.R

#' Automobile Data Set
#'
#' This data set consists of three types of entities: (a) the specification of
#' an auto in terms of various characteristics, (b) its assigned insurance risk
#' rating, (c) its normalized losses in use as compared to other cars. The second
#' rating corresponds to the degree to which the auto is more risky than its
#' price indicates. Cars are initially assigned a risk factor symbol associated
#' with its price. Then, if it is more risky (or less), this symbol is adjusted
#' by moving it up (or down) the scale. Actuarians call this process "symboling".
#' A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.
#'
#' @format A data frame with 205 rows and 26 variables. The first 15 variables are
#'   continuous, while the last 11 variables are categorical. There are 45 rows
#'   with missing values.
#' \describe{
#'   \item{normalized_losses}{continuous from 65 to 256.}
#'   \item{wheel_base}{continuous from 86.6 120.9.}
#'   \item{length}{continuous from 141.1 to 208.1.}
#'   \item{width}{continuous from 60.3 to 72.3.}
#'   \item{height}{continuous from 47.8 to 59.8.}
#'   \item{curb_weight}{continuous from 1488 to 4066.}
#'   \item{engine_size}{continuous from 61 to 326.}
#'   \item{bore}{continuous from 2.54 to 3.94.}
#'   \item{stroke}{continuous from 2.07 to 4.17.}
#'   \item{compression_ratio}{continuous from 7 to 23.}
#'   \item{horsepower}{continuous from 48 to 288.}
#'   \item{peak_rpm}{continuous from 4150 to 6600.}
#'   \item{city_mpg}{continuous from 13 to 49.}
#'   \item{highway_mpg}{continuous from 16 to 54.}
#'   \item{price}{continuous from 5118 to 45400.}
#'   \item{symboling}{-3, -2, -1, 0, 1, 2, 3.}
#'   \item{make}{alfa-romero, audi, bmw, chevrolet, dodge, honda, isuzu, jaguar,
#'     mazda, mercedes-benz, mercury, mitsubishi, nissan, peugot, plymouth, porsche,
#'     renault, saab, subaru, toyota, volkswagen, volvo}
#'   \item{fuel_type}{diesel, gas.}
#'   \item{aspiration}{std, turbo.}
#'   \item{num_doors}{four, two.}
#'   \item{body_style}{hardtop, wagon, sedan, hatchback, convertible.}
#'   \item{drive_wheels}{4wd, fwd, rwd.}
#'   \item{engine_location}{front, rear.}
#'   \item{engine_type}{dohc, dohcv, l, ohc, ohcf, ohcv, rotor.}
#'   \item{num_cylinders}{eight, five, four, six, three, twelve, two.}
#'   \item{fuel_system}{1bbl, 2bbl, 4bbl, idi, mfi, mpfi, spdi, spfi.}
#' }
#' @source Kibler, D., Aha, D.W., & Albert,M. (1989). Instance-based prediction of real-valued attributes. Computational Intelligence, Vol 5, 51--57.
#' \url{https://archive.ics.uci.edu/ml/datasets/automobile}
"auto"

#' Bankruptcy Data Set
#'
#' The data set contains the ratio of retained earnings (RE) to total assets, and
#' the ratio of earnings before interests and taxes (EBIT) to total assets of 66
#' American firms recorded in the form of ratios. Half of the selected firms had
#' filed for bankruptcy.
#'
#' @format A data frame with 66 rows and 3 variables:
#' \describe{
#'   \item{Y}{Status of the firm: 0 for bankruptcy and 1 for financially sound.}
#'   \item{RE}{Ratio of retained earnings.}
#'   \item{EBIT}{Ratio of earnings before interests and taxes.}
#' }
#' @source Altman E.I. (1968). Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. \emph{J Finance} 23(4): 589-609
#' \url{https://www.jstor.org/stable/2978933}
"bankruptcy"

#' US Cost of Living Indices in 2019 Data Set
#'
#' The data set contains the 2019 cost of living indices of 50 states in five different categories: grocery,
#' housing, transportation, utilities, and miscellaneous (Washington DC is not included). The indices
#' are calculated by first determining the average cost of living in the United States to
#' be used as a baseline set at 100. States are then measured against this baseline. For example,
#' a state with a cost of living index of 200 is twice as expensive as the national average.
#'
#' @format A data frame with 50 rows and 7 variables. There are no missing values
#' \describe{
#'   \item{Abbr}{State abbreviation.}
#'   \item{State}{State name.}
#'   \item{Grocery}{Grocery index.}
#'   \item{Housing}{Housing index.}
#'   \item{Utilities}{Utilities index}
#'   \item{Transportation}{Transporation index.}
#'   \item{Misc}{Miscellaneous index}
#' }
#'
#' @source
#' \url{https://worldpopulationreview.com}
"UScost"

Try the MixtureMissing package in your browser

Any scripts or data that you put into this service are public.

MixtureMissing documentation built on Oct. 16, 2024, 1:09 a.m.