R/faraway-package.R
In faraway: Functions and Datasets for Books by Julian Faraway

#' Annual mean temperatures in Ann Arbor, Michigan
#'
#' The data comes from the U.S. Historical Climatology Network.
#'
#'
#' @name aatemp
#' @docType data
#' @format A data frame with 115 observations on the following 2 variables.
#' \describe{ \item{year}{year from 1854 to 2000} \item{temp}{annual mean
#' temperatures in degrees F in Ann Arbor} }
#' @source United States Historical Climatology Network:
#' \url{https://www.ncei.noaa.gov/products/land-based-station/us-historical-climatology-network}
#' @keywords datasets
NULL





#' Wear on materials according to type, run and position
#'
#' The \code{abrasion} data frame has 16 rows and 4 columns.  Four materials
#' were fed into a wear testing machine and the amount of wear recorded. Four
#' samples could be processed at the same time and the position of these
#' samples may be important.  A Latin square design was used.
#'
#'
#' @name abrasion
#' @docType data
#' @format This data frame contains the following columns: \describe{
#'
#' \item{run}{ The run number 1-4 }
#'
#' \item{position}{ The position number 1-4 }
#'
#' \item{material}{ The material A-D }
#'
#' \item{wear}{ The wear measured loss of weight in 0.1mm over testing
#' period }}
#' @source The Design and Analysis of Industrial Experiments by O. Davies,
#' 1954, published by Wiley
#' @keywords datasets
NULL





#' aflatoxin dosage and liver cancer in lab animals
#'
#' Aflatoxin B1 was fed to lab animals at vary doses and the number responding
#' with liver cancer recorded.
#'
#'
#' @name aflatoxin
#' @docType data
#' @format A data frame with 6 observations on the following 3 variables.
#' \describe{ \item{dose}{dose in ppb} \item{total}{number of test animals}
#' \item{tumor}{number with liver cancer} }
#' @source Gaylor DW (1987) "Linear nonparametric upper limits for low dose
#' extrapolation" ASA Proceedings of the Biopharmaceutical Section.
#' @keywords datasets
#' @examples
#'
#' data(aflatoxin)
#'
NULL





#' miltary coups and politics in sub-Saharan Africa
#'
#' Data is a subset of a larger study on factors affecting regime stability in
#' Sub-Saharan Africa
#'
#'
#' @name africa
#' @docType data
#' @format A data frame with 47 observations on the following 9 variables.
#' \describe{ \item{miltcoup}{number of successful military coups from
#' independence to 1989} \item{oligarchy}{number years country ruled by
#' military oligarchy from independence to 1989} \item{pollib}{Political
#' liberalization - 0 = no civil rights for political expression, 1 = limited
#' civil rights for expression but right to form political parties, 2 = full
#' civil rights} \item{parties}{Number of legal political parties in 1993}
#' \item{pctvote}{Percent voting in last election} \item{popn}{Population in
#' millions in 1989} \item{size}{Area in 1000 square km} \item{numelec}{Total
#' number of legislative and presidential elections} \item{numregim}{Number of
#' regime types} }
#' @references "Bayesian Methods: A Social and Behavioral Sciences Approach" by
#' Jeff Gill 2002.
#' @source Bratton, Michael, and Nicholas Van De Walle.  1997.  ``Political
#' Regimes and Regime Transitions in Africa, 1910-1994.'' \emph{Study Number
#' I06996.} Ann Arbor: Inter-University Consortium for Political and Social
#' Research.
#' @keywords datasets
NULL





#' Airline passengers
#'
#' Monthly totals of airline passengers from 1949 to 1951
#'
#' Well known time series example dataset
#'
#' @name airpass
#' @docType data
#' @format A data frame with 144 observations on the following 2 variables.
#' \describe{ \item{pass}{number of passengers in thousands}
#' \item{year}{the date as a decimal} }
#' @references Box, G.E.P., Jenkins, G.M. and Reinsel, G.C. (1994) Time Series
#' Analysis, Forecasting and Control, 3rd edn.  Englewood Cliffs, N.J.:
#' Prentice-Hall.
#' @source Brown, R.G.(1962) Smoothing, Forecasting and Prediction of Discrete
#' Time Series.  Englewood Cliffs, N.J.: Prentice-Hall.
#' @keywords datasets
#' @examples
#'
#' data(airpass)
#' ## maybe str(airpass) ; plot(airpass) ...
#'
NULL





#' Effects of seed inoculum, irrigation and shade on alfalfa yield
#'
#' The \code{alfalfa} data frame has 25 rows and 4 columns. Data comes from an
#' experiment to test the effects of seed inoculum, irrigation and shade on
#' alfalfa yield. A latin square design has been used.
#'
#'
#' @name alfalfa
#' @docType data
#' @format This data frame contains the following columns: \describe{
#' \item{shade}{ Distance of location from tree line divided into 5
#' shade areas } \item{irrigation}{ Irrigation effect divided into 5
#' levels } \item{inoculum}{ Four types of seed incolum, A-D with E as
#' control. } \item{yield}{ Dry matter yield of alfalfa }}
#' @source Petersen, R.G. 1994. Agricultural Field Experiments, Design and
#' Analysis. Marcel Dekker, Inc., New York.  Pages 70-74. 1994
#' @keywords datasets
NULL





#' Match pair study for AML and Xray link
#'
#' A matched case control study carried out to investigate the connection
#' between X-ray usage and acute myeloid leukemia in childhood. The pairs are
#' matched by age, race and county of residence.
#'
#'
#' @name amlxray
#' @docType data
#' @format A data frame with 238 observations on the following 11 variables.
#' \describe{ \item{ID}{a factor denoting the matched pairs}
#' \item{disease}{0=control, 1=case} \item{Sex}{ \code{F} or
#' \code{M}} \item{downs}{Presence of Downs syndrome: \code{no} or
#' \code{yes}} \item{age}{Age in years} \item{Mray}{Did the
#' mother ever have an Xray: \code{no} or \code{yes}} \item{MupRay}{Did
#' the mother have an Xray of the upper body during pregnancy: \code{no} or
#' \code{yes}} \item{MlowRay}{Did the mother have an Xray of the lower
#' body during pregnancy: \code{no} or \code{yes}} \item{Fray}{Did the
#' father ever have an Xray: \code{no} or \code{yes}} \item{Cray}{Did
#' the child ever have an Xray: \code{no} or \code{yes}}
#' \item{CnRay}{Total number of Xrays of the child \code{1}=none <
#' \code{2}=1 or 2 < \code{3}=3 or 4 < \code{4}= 5 or more} }
#' @source Chap T. Le (1998) "Applied Categorical Data Analysis" Wiley.
#' @keywords datasets
NULL





#' Time in minutes to eye opening after reversal of anaesthetic.
#'
#' A doctor at major London hospital compared the effects of 4 anaesthetics
#' used in major operations. 80 patients were divided into groups of 20.
#'
#'
#' @name anaesthetic
#' @docType data
#' @format A data frame with 80 observations on the following 2 variables.
#' \describe{ \item{breath}{time in minutes to start breathing
#' unassisted} \item{tgrp}{Four treatment groups \code{A} \code{B}
#' \code{C} \code{D}} }
#' @source Chatfield C. (1995) Problem Solving: A Statistician's Guide, 2ed
#' Chapman Hall.
#' @keywords datasets
#' @examples
#'
#' data(anaesthetic)
#' ## maybe str(anaesthetic) ; plot(anaesthetic) ...
#'
NULL





#' Respiratory disease rates of babies fed in different ways
#'
#' Study on infant respiratory disease, namely the proportions of children
#' developing bronchitis or pneumonia in their first year of life by type of
#' feeding and sex.
#'
#'
#' @name babyfood
#' @docType data
#' @format A data frame with 6 observations on the following 4 variables.
#' \describe{ \item{disease}{number with disease}
#' \item{nondisease}{number without disease} \item{sex}{a
#' factor with levels \code{Boy} \code{Girl}} \item{food}{a factor with
#' levels \code{Bottle} \code{Breast} \code{Suppl}} }
#' @source Payne, C. (1987). The GLIM System Release 3.77 Manual (2 ed.).
#' Oxford: Nu- merical Algorithms Group.
#' @keywords datasets
#' @examples
#'
#' data(babyfood)
#' ## maybe str(babyfood) ; plot(babyfood) ...
#'
NULL





#' Beetles exposed to fumigant
#'
#' Grain beetles were exposed to ethylene oxide
#'
#'
#' @name beetle
#' @docType data
#' @format A data frame with 10 observations on the following 3 variables.
#' \describe{ \item{conc}{concentration of ethylene oxide in mg/l}
#' \item{affected}{number affected} \item{exposed}{number
#' exposed} }
#' @references Collet D. "Modelling Binary Data"
#' @source Busvine (1938)
#' @keywords datasets
#' @examples
#'
#' data(beetle)
#' ## maybe str(beetle) ; plot(beetle) ...
#'
NULL





#' Insect mortality due to insecticide
#'
#' An experiment measuring death rates for insects, with 30 insects at each of
#' five treatment levels.
#'
#'
#' @name bliss
#' @docType data
#' @format A data frame with 5 observations on the following 3 variables.
#' \describe{ \item{dead}{number dead} \item{alive}{number
#' alive} \item{conc}{concentration of insecticide} }
#' @source Bliss (1935). The calculation of the dosage-mortality curve. Annals
#' of Applied Biology 22, 134-167.
#' @keywords datasets
#' @examples
#'
#' data(bliss)
#' ## maybe str(bliss) ; plot(bliss) ...
#'
NULL





#' Breaking strength of materials
#'
#' An experiment was conducted to select the supplier of raw materials for
#' production of a component. The breaking strength of the component was the
#' objective of interest.  Four suppliers were considered. The four operators
#' can only produce one component each per day. A Latin square design was used.
#'
#'
#' @name breaking
#' @docType data
#' @format A data frame with 16 observations on the following 4 variables.
#' \describe{ \item{y}{The breaking strength of the component}
#' \item{operator}{the operator - a factor with levels \code{op1}
#' \code{op2} \code{op3} \code{op4}} \item{day}{the day of production -
#' a factor with levels \code{day1} \code{day2} \code{day3} \code{day4}}
#' \item{supplier}{the supplier of the raw material - a factor with
#' levels \code{A} \code{B} \code{C} \code{D}} }
#' @source Lentner M. and Bishop T. (1986) Experimental Design and Analysis,
#' Valley Book Company
#' @keywords datasets
NULL





#' Broccoli weight variation
#'
#' A number of growers supply broccoli to a food processing plant. The plant
#' instructs the growers to pack the broccoli into standard size boxes. There
#' should be 18 clusters of broccoli per box and each cluster should weigh
#' between 1.33 and 1.5 pounds. Because the growers use different varieties,
#' methods of cultivation etc, there is some variation in the cluster weights.
#' The plant manager selected 3 growers at random and then 4 boxes at random
#' supplied by these growers. 3 clusters were selected from each box.
#'
#'
#' @name broccoli
#' @docType data
#' @format A data frame with 36 observations on the following 4 variables.
#' \describe{ \item{wt}{weight of broccoli} \item{grower}{the
#' grower - a factor with levels \code{1} \code{2} \code{3}}
#' \item{box}{the box - a factor with levels \code{1} \code{2} \code{3}
#' \code{4}} \item{cluster}{the cluster - a factor with levels \code{1}
#' \code{2} \code{3}} }
#' @source Lentner M. and Bishop T. (1986) Experimental Design and Analysis,
#' Valley Book Company
#' @keywords datasets
NULL





#' Butterfat content of milk by breed
#'
#' Average butterfat content (percentages) of milk for random samples of twenty
#' cows (ten two-year old and ten mature (greater than four years old)) from
#' each of five breeds. The data are from Canadian records of pure-bred dairy
#' cattle.
#'
#'
#' @name butterfat
#' @docType data
#' @format A data frame with 100 observations on the following 3 variables.
#' \describe{ \item{Butterfat}{butter fat content by percentage}
#' \item{Breed}{a factor with levels \code{Ayrshire} \code{Canadian}
#' \code{Guernsey} \code{Holstein-Fresian} \code{Jersey}} \item{Age}{a
#' factor with levels \code{2year} \code{Mature}} }
#' @source Sokal, R. R. and Rohlf, F. J. (1994) Biometry. W. H. Freeman, New York, third edition.
#' @keywords datasets
#' @examples
#'
#' data(butterfat)
#' ## maybe str(butterfat) ; plot(butterfat) ...
#'
NULL





#' Cathedral nave heights and lengths in England
#'
#' Example Dataset from "Practical Regression and Anova"
#'
#'
#' @name cathedral
#' @docType data
#' @format A dataset with 25 cases \describe{
#' \item{style}{of the cathedral - romanesque or gothic}
#' \item{height}{in feet}
#' \item{width}{in feet}}
#' @references Reference details may be found in "Practical Regression and
#' Anova" by Julian Faraway
#' @source Weisberg, S. (2005). Applied Linear Regression, 3rd edition. New York: Wiley
#' @keywords datasets
NULL





#' Taste of Cheddar cheese
#'
#' In a study of cheddar cheese from the LaTrobe Valley of Victoria, Australia,
#' samples of cheese were analyzed for their chemical composition and were
#' subjected to taste tests. Overall taste scores were obtained by combining
#' the scores from several tasters.
#'
#'
#' @name cheddar
#' @docType data
#' @format A data frame with 30 observations on the following 4 variables.
#' \describe{ \item{taste}{a subjective taste score}
#' \item{Acetic}{concentration of acetic acid (log scale)}
#' \item{H2S}{concentration of hydrogen sulfide (log scale)}
#' \item{Lactic}{concentration of lactic acid} }
#' @source David S. Moore and George P. McCabe (1993) Introduction to the Practice of Statistics,
#' W. H. Freeman and company, second edition.
#' @keywords datasets
#' @examples
#'
#' data(cheddar)
#' ## maybe str(cheddar) ; plot(cheddar) ...
#'
NULL





#' Chicago insurance redlining
#'
#' Data from a 1970's study on the relationship between insurance redlining in
#' Chicago and racial composition, fire and theft rates, age of housing and
#' income in 47 zip codes.
#'
#'
#' @name chicago
#' @docType data
#' @format This dataframe contains the following columns \describe{
#'
#' \item{race}{ racial composition in percent minority }
#' \item{fire}{ fires per 100 housing units } \item{theft}{
#' theft per 1000 population } \item{age}{ percent of housing units
#' built before 1939 } \item{involact}{ new FAIR plan policies and
#' renewals per 100 housing units } \item{income}{ median family income
#' in thousands of dollars} \item{side}{ North or South side of
#' Chicago}
#'
#' }
#' @source Adapted from "Data : A Collection of Problems from Many Fields for
#' the Student and Research Worker" by D. Andrews and A. Herzberg published by
#' Springer-Verlag, in 1985
#' @keywords datasets
NULL


#' Chicago zip codes north-south
#'
#' Complements the chicago and chmiss datasets by dividing the zip codes into
#' north and south
#'
#' @name chiczip
#' @docType data
#' @format \describe{
#' \item{chiczip}{takes the values "n" (north) and "s" south}
#' }
#' @references Reference details may be found in "Practical Regression and
#' Anova" by Julian Faraway
#' @seealso chicago
#' @keywords datasets
NULL





#' Chicago insurance redlining
#'
#' Data from a 1970's study on the relationship between insurance redlining in
#' Chicago and racial composition, fire and theft rates, age of housing and
#' income in 47 zip codes. Missing values have been randomly added.
#'
#'
#' @name chmiss
#' @docType data
#' @format This dataframe contains the following columns \describe{
#'
#' \item{race}{ racial composition in percent minority }
#' \item{fire}{ fires per 100 housing units } \item{theft}{
#' theft per 1000 population } \item{age}{ percent of housing units
#' built before 1939 } \item{involact}{ new FAIR plan policies and
#' renewals per 100 housing units } \item{income}{ median family income
#' in thousands of dollars} \item{side}{ North or South side of
#' Chicago}
#'
#' }
#' @source Adapted from "Data : A Collection of Problems from Many Fields for
#' the Student and Research Worker" by D. Andrews and A. Herzberg published by
#' Springer-Verlag, in 1985
#' @keywords datasets
NULL





#' Chocolate cake experiment with split plot design
#'
#' An experiment was conducted to determine the effect of recipe and baking
#' temperature on chocolate cake quality. 15 batches of cake mix for each
#' recipe were prepared. Each batch was sufficient for six cakes. Each of the
#' six cakes was baked at a different temperature which was randomly assigned.
#' Several measures of cake quality were recorded of which breaking angle was
#' just one.
#'
#'
#' @name choccake
#' @docType data
#' @format A data frame with 270 observations on the following 4 variables.
#' \describe{ \item{recipe}{Chocolate for recipe 1 was added at 40C,
#' Chocolate for recipe 2 was added at 60C and recipe 3 had extra sugar}
#' \item{batch}{batch number from 1 to 15}
#' \item{temp}{temperature at which cake was baked: \code{175C}
#' \code{185C} \code{195C} \code{205C} \code{215C} \code{225C}}
#' \item{breakang}{the breaking angle of the cake} }
#' @source Cochran W. and Cox G. (1992) Experimental Designs, 2nd Edition Wiley
#' @keywords datasets
NULL





#' Chicago insurance redlining
#'
#' Data from a 1970's study on the relationship between insurance redlining in
#' Chicago and racial composition, fire and theft rates, age of housing and
#' income in 47 zip codes
#'
#'
#' @name chredlin
#' @docType data
#' @format This dataframe contains the following columns \describe{
#'
#' \item{race}{ racial composition in percent minority }
#' \item{fire}{ fires per 100 housing units } \item{theft}{
#' theft per 1000 population } \item{age}{ percent of housing units
#' built before 1939 } \item{involact}{ new FAIR plan policies and
#' renewals per 100 housing units } \item{income}{ median family income
#' in thousands of dollars} \item{side}{ North or South side of
#' Chicago} }
#' @source Adapted from "Data : A Collection of Problems from Many Fields for
#' the Student and Research Worker" by D. Andrews and A. Herzberg published by
#' Springer-Verlag, in 1985
#' @keywords datasets
NULL





#' Blood clotting times
#'
#' The clotting times of blood for plasma diluted with nine different
#' percentage concentrations with prothrombin-free plasma
#'
#'
#' @name clot
#' @docType data
#' @format This data frame contains the following columns: \describe{
#' \item{time}{time in seconds to clot}
#' \item{conc}{concentration in percent} \item{lot}{lot number
#' - either one or two} }
#' @references Nelder & McCullagh (1989) Generalized Linear Models (2ed)
#' @source Hurn et al (1945)
#' @keywords datasets
NULL





#' Social class mobility from 1971 to 1981 in the UK
#'
#' Social class mobility from 1971 to 1981 for 42425 men from the United
#' Kingdom census. Subjects were aged 45-64.
#'
#'
#' @name cmob
#' @docType data
#' @format A data frame with 36 observations on the following 3 variables.
#' \describe{ \item{y}{Frequency of observation} \item{class71}{social class in
#' 1971 - a factor with levels \code{I}, professionals, \code{II}
#' semi-professionals, \code{IIIN} skilled non-manual, \code{IIIM} skilled
#' manual, \code{IV} semi-skilled, \code{V} unskilled} \item{class81}{social
#' class in 1971 - a factor with levels \code{I} \code{II} \code{IIIN}
#' \code{IIIM} \code{IV} \code{V} with same classification} }
#' @source D. Blane and S. Harding and M. Rosato (1999) "Does social mobility
#' affect the size of the socioeconomic mortality differential?: Evidence from
#' the Office for National Statistics Longitudinal Study" JRSS-A, 162 59-70.
#' @keywords datasets
NULL





#' Malformations of the central nervous system
#'
#' Frequencies of various malformations of the central nervous system recorded
#' on live births in South Wales, UK. Study was designed to determine the
#' effect of water hardness on the incidence of such malformations.
#'
#'
#' @name cns
#' @docType data
#' @format A data frame with 16 observations on the following 7 variables.
#' \describe{ \item{Area}{a factor with levels \code{Cardiff} \code{GlamorganC}
#' \code{GlamorganE} \code{GlamorganW} \code{MonmouthOther} \code{MonmouthV}
#' \code{Newport} \code{Swansea} being areas of South Wales} \item{NoCNS}{count
#' of births with no CNS problem} \item{An}{count of Anencephalus births}
#' \item{Sp}{count of Spina Bifida births} \item{Other}{count of other CNS
#' births} \item{Water}{water hardeness} \item{Work}{a factor with levels
#' \code{Manual} \code{NonManual} being the type of work done by the parents} }
#' @references P. McCullagh and J. Nelder (1989), Generalized Linear Models,
#' Chapman and Hall, 2nd Ed.
#' @source C. Lowe and C. Roberts and S. Lloyd, (1971) Malformations of the
#' central nervous system and softness of local water supplies, British Medical
#' Journal, 15,357-361.
#' @keywords datasets
NULL





#' Blood coagulation times by diet
#'
#' Dataset comes from a study of blood coagulation times. 24 animals were
#' randomly assigned to four different diets and the samples were taken in a
#' random order.
#'
#'
#' @name coagulation
#' @docType data
#' @format This dataframe contains the following columns \describe{
#'
#' \item{coag}{ coagulation time in seconds }
#'
#' \item{diet}{ diet type - A,B,C or D } }
#' @source "Statistics for Experimenters" by G. P. Box, W. G. Hunter and J. S.
#' Hunter, Wiley, 1978
#' @keywords datasets
NULL





#' Strength of a thermoplastic composite depending on two factors
#'
#' The \code{composite} data frame has 9 rows and 3 columns. Data comes from an
#' experiment to test the strength of a thermoplastic composite depending on
#' the power of a laser and speed of a tape.
#'
#'
#' @name composite
#' @docType data
#' @format This data frame contains the following columns: \describe{
#' \item{strength}{ interply bond strength of the composite }
#' \item{laser}{ laser power at 40, 50 or 60W } \item{tape}{
#' tape speed, slow=6.42 m/s, medium=13m/s and fast=27m/s }}
#' @source Mazumdar, S and Hoa S (1995) "Application of a Taguchi Method for
#' Process enhancement of an online consolidation technique" Composites 26,
#' 669-673
#' @keywords datasets
NULL





#' Corn yields from nitrogen application
#'
#' The relationship between corn yield (bushels per acre) and nitrogen (pounds
#' per acre) fertilizer application were studied in Wisconsin.
#'
#'
#' @name cornnit
#' @docType data
#' @format A data frame with 44 observations on the following 2 variables.
#' \describe{ \item{yield}{corn yield in bushels per acre}
#' \item{nitrogen}{pounds per acre} }
#' @source Unknown
#' @keywords datasets
NULL





#' Corrosion loss in Cu-Ni alloys
#'
#' Data consist of thirteen specimens of 90/10 Cu-Ni alloys with varying iron
#' content in percent. The specimens were submerged in sea water for 60 days
#' and the weight loss due to corrosion was recorded in units of milligrams per
#' square decimeter per day.
#'
#' @name corrosion
#' @docType data
#' @format This dataframe contains the following columns \describe{
#'
#' \item{Fe}{ Iron content in percent } \item{loss}{ Weight
#' loss in mg per square decimeter per day } }
#' @source "Applied Regression Analysis" by N. Draper and H. Smith, Wiley, 1998
#' @keywords datasets
NULL





#' Projected and actual sales of 20 consumer products
#'
#' Projected and actual sales of 20 consumer products. Data have been disguised
#' from original form.
#'
#'
#' @name cpd
#' @docType data
#' @format A data frame with 20 observations on the following 2 variables.
#' \describe{ \item{projected}{projected sales in dollars} \item{actual}{actual
#' sales in dollars} }
#' @source G. Whitmore (1986) "Inverse Gaussian Ratio Estimation" Applied
#' Statistics, 35, 8-15.
#' @keywords datasets
NULL





#' Crawling babies by month
#'
#' A study investigated whether babies take longer to learn to crawl in cold
#' months when they are often bundled in clothes that restrict their movement,
#' than in warmer months. The study sought an association between babies' first
#' crawling age and the average temperature during the month they first try to
#' crawl (about 6 months after birth). Parents brought their babies into the
#' University of Denver Infant Study Center between 1988-1991 for the study.
#' The parents reported the birth month and age at which their child was first
#' able to creep or crawl a distance of four feet in one minute.  Data were
#' collected on 208 boys and 206 girls (40 pairs of which were twins)
#'
#'
#' @name crawl
#' @docType data
#' @format A data frame with 12 observations on the following 4 variables.
#' \describe{ \item{crawling}{average crawling age in weeks}
#' \item{SD}{standard deviation of crawling age}
#' \item{n}{sample size} \item{temperature}{average
#' temperature(F) six months after birth} }
#' @source Benson, Janette. (1993). Infant Behavior and Development
#' @keywords datasets
#' @examples
#'
#' data(crawl)
#' ## maybe str(crawl) ; plot(crawl) ...
#'
NULL





#' Effects of surface and vision on balance.
#'
#' An experiment was conducted to study the effects of surface and vision on
#' balance. The balance of subjects were observed for two different surfaces
#' and for restricted and unrestricted vision. Balance was assessed
#' qualitatively on an ordinal four-point scale based on observation by the
#' experimenter. Forty subjects were studied, twenty males and twenty females
#' ranging in age from 18 to 38, with heights given in cm and weights in kg.
#' The subjects were tested while standing on foam or a normal surface and with
#' their eyes closed or open or with a dome placed over their head.  Each
#' subject was tested twice in each of the surface and eye combinations for a
#' total of 12 measures per subject.
#'
#'
#' @name ctsib
#' @docType data
#' @format A data frame with 480 observations on the following 8 variables.
#' \describe{ \item{Subject}{an indicator} \item{Sex}{a factor
#' with levels \code{female} \code{male}} \item{Age}{in years}
#' \item{Height}{in cm} \item{Weight}{in kg}
#' \item{Surface}{a factor with levels \code{foam} \code{norm}}
#' \item{Vision}{a factor with levels \code{closed} \code{dome}
#' \code{open}} \item{CTSIB}{a four point scale measuring balance} }
#' @references OzDasl
#' @source Steele, R. (1998). Effect of surface and vision on balance. Ph. D.
#' thesis, Depart- ment of Physiotherapy, University of Queensland.
#' @keywords datasets
#' @examples
#'
#' data(ctsib)
#' ## maybe str(ctsib) ; plot(ctsib) ...
#'
NULL





#' Death penalty in Florida 1977
#'
#' Data on 326 defendents in homicide indictments in 20 Florida counties during
#' 1976-77.
#'
#'
#' @name death
#' @docType data
#' @format A data frame with 8 observations on the following 4 variables.
#' \describe{ \item{y}{a numeric vector} \item{penalty}{Did the
#' subject recieve the death penalty?  \code{no} or \code{yes}}
#' \item{victim}{Was the victim \code{b}lack or \code{w}hite?}
#' \item{defend}{Was the defendent \code{b}lack or \code{w}hite?} }
#' @references Agresti A. (1990) Categorical Data Analysis, Wiley.
#' @source Radelet M. (1981) Racial characteristics and the imposition of the
#' death penalty. Amer. Sociol. Rev. \bold{46} 918-927.
#' @keywords datasets
NULL





#' psychology of debt
#'
#' The data arise from a large postal survey on the psychology of debt.
#'
#' All yes/no questions are coded 0=no, 1=yes. Locus of control is a
#' personality measure introduced by Rotter, which claims to differentiate
#' people according to how much they feel things that happen to them are as a
#' result of processes within themselves (internal locus of control) or outside
#' events (external locus of control).
#'
#' @name debt
#' @docType data
#' @format A data frame with 464 observations on the following 13 variables.
#' \describe{ \item{incomegp}{income group (1=lowest, 5=highest)}
#' \item{house}{security of housing tenure (1=rent, 2=mortgage, 3=owned
#' outright)} \item{children}{number of children in household}
#' \item{singpar}{is the respondent a single parent?} \item{agegp}{age group
#' (1=youngest)} \item{bankacc}{does the respondent have a bank account?}
#' \item{bsocacc}{does the respondent have a building society account?}
#' \item{manage}{self-rating of money management skill (high values=high
#' skill)} \item{ccarduse}{how often did s/he use credit cards (1=never...
#' 3=regularly)} \item{cigbuy}{does s/he buy cigarettes?} \item{xmasbuy}{does
#' s/he buy Christmas presents for children?} \item{locintrn}{score on a locus
#' of control scale (high values=internal)} \item{prodebt}{score on a scale of
#' attitudes to debt (high values=favourable to debt} }
#' @source Lea, Webley & Walker, 1995, Journal of Economic Psychology, 16,
#' 181-201 Data obtained from \url{http://people.exeter.ac.uk/SEGLea/}.
#' @keywords datasets
NULL





#' Denim wastage by supplier
#'
#' Five suppliers cut denim material for a jeans manufacturer. An algorithm is
#' used to estimate how much material will be wasted given the dimensions of
#' the material supplied. Typically, a supplier wastes more material than the
#' target based on the algorithm although occasionally they waste less. The
#' percentage of waste relative to target was collected weekly for the 5
#' suppliers. In all, 95 observations were recorded.
#'
#'
#' @name denim
#' @docType data
#' @format A data frame with 95 observations on the following 2 variables.
#' \describe{ \item{waste}{percentage wastage}
#' \item{supplier}{a factor with levels \code{1} \code{2} \code{3}
#' \code{4} \code{5}} }
#' @source Unknown
#' @keywords datasets
#' @examples
#'
#' data(denim)
#' ## maybe str(denim) ; plot(denim) ...
#'
NULL





#' Diabetes and obesity, cardiovascular risk factors
#'
#' 403 African Americans were interviewed in a study to understand the
#' prevalence of obesity, diabetes, and other cardiovascular risk factors in
#' central Virginia.
#'
#' Glycosolated hemoglobin greater than 7.0 is usually taken as a positive
#' diagnosis of diabetes
#'
#' @name diabetes
#' @docType data
#' @format A data frame with 403 observations on the following 19 variables.
#' \describe{ \item{id}{Subject ID} \item{chol}{Total
#' Cholesterol} \item{stab.glu}{Stabilized Glucose}
#' \item{hdl}{High Density Lipoprotein}
#' \item{ratio}{Cholesterol/HDL Ratio}
#' \item{glyhb}{Glycosolated Hemoglobin} \item{location}{County
#' - a factor with levels \code{Buckingham} \code{Louisa}}
#' \item{age}{age in years} \item{gender}{a factor with levels
#' \code{male} \code{female}} \item{height}{height in inches}
#' \item{weight}{weight in pounds} \item{frame}{a factor with
#' levels \code{small} \code{medium} \code{large}} \item{bp.1s}{First
#' Systolic Blood Pressure} \item{bp.1d}{First Diastolic Blood
#' Pressure} \item{bp.2s}{Second Systolic Blood Pressure}
#' \item{bp.2d}{Second Diastolic Blood Pressure}
#' \item{waist}{waist in inches} \item{hip}{hip in inches}
#' \item{time.ppn}{Postprandial Time (in minutes) when Labs were Drawn}
#' }
#' @references Schorling JB, Roach J, Siegel M, Baturka N, Hunt DE, Guterbock
#' TM, Stewart HL: A trial of church-based smoking cessation interventions for
#' rural African Americans. Preventive Medicine 26:92-101; 1997
#' @source Willems JP, Saunders JT, DE Hunt, JB Schorling: Prevalence of
#' coronary heart disease risk factors among rural blacks: A community-based
#' study. Southern Medical Journal 90:814-820; 1997
#' @keywords datasets
NULL





#' Radiation dose effects on chromosomal abnormality
#'
#' An experiment was conducted to determine the effect of gamma radiation on
#' the numbers of chromosomal abnormalities observed
#'
#'
#' @name dicentric
#' @docType data
#' @format A data frame with 27 observations on the following 4 variables.
#' \describe{ \item{cells}{Number of cells in hundreds}
#' \item{ca}{Number of chromosomal abnormalities}
#' \item{doseamt}{amount of dose in Grays} \item{doserate}{rate
#' of dose in Grays/hour} }
#' @references Frome E. and DuFrain R. (1986) Maximum Likelihood Estimation for
#' Cytogenic Dose-Response Curves. Biometrics. 42, 73-84.
#' @source Purott R. and Reeder E. (1976) The effect of changes in dose rate on
#' the yield of chromosome aberrations in human lymphocytes exposed to gamma
#' radiation. Mutation Research. 35, 437-444.
#' @keywords datasets
NULL





#' Divorce in the USA 1920-1996
#'
#' Divorce rates in the USA from 1920-1996
#'
#'
#' @name divusa
#' @docType data
#' @format A data frame with 77 observations on the following 7 variables.
#' \describe{ \item{year}{the year from 1920-1996} \item{divorce}{divorce per
#' 1000 women aged 15 or more} \item{unemployed}{unemployment rate}
#' \item{femlab}{percent female participation in labor force aged 16+}
#' \item{marriage}{marriages per 1000 unmarried women aged 16+}
#' \item{birth}{births per 1000 women aged 15-44} \item{military}{military
#' personnel per 1000 population} }
#' @source Unknown
#' @keywords datasets
NULL





#' Choice of drug treatment for psychiatry patients
#'
#' A sample of psychiatry patients were cross-classified by their diagnosis and
#' whether a drug treatment was prescribed.
#'
#'
#' @name drugpsy
#' @docType data
#' @format A data frame with 10 observations on the following 3 variables.
#' \describe{ \item{y}{the number of patients}
#' \item{diagnosis}{a factor with levels \code{Affective.Disorder}
#' \code{Neurosis} \code{Personality.Disorder} \code{Schizophrenia}
#' \code{Special.Symptoms}} \item{drug}{a factor with levels \code{no}
#' \code{yes}} }
#' @references Agresti A. (1990) "Categorical Data Analysis" Wiley
#' @source Helmes E. and Fekken G. (1986) Effects of psychotropic drugs and
#' psychiatric illness on vocational aptitude and interest assessment. J. Clin.
#' Psychol. \bold{42} 569-576
#' @keywords datasets
NULL





#' Doctor visits in Australia
#'
#' The data come from the Australian Health Survey of 1977-78 and consist of
#' 5190 single adults where young and old have been oversampled.
#'
#'
#' @name dvisits
#' @docType data
#' @format A data frame with 5190 observations on the following 19 variables.
#' \describe{ \item{sex}{1 if female, 0 if male} \item{age}{Age
#' in years divided by 100 (measured as mid-point of 10 age groups from 15-19
#' years to 65-69 with 70 or more coded treated as 72)}
#' \item{agesq}{age squared} \item{income}{Annual income in
#' Australian dollars divided by 1000 (measured as mid-point of coded ranges
#' Nil, less than 200, 200-1000, 1001-, 2001-, 3001-, 4001-, 5001-, 6001-,
#' 7001-, 8001-10000, 10001-12000, 12001-14000, with 14001- treated as 15000 }
#' \item{levyplus}{1 if covered by private health insurance fund for
#' private patient in public hospital (with doctor of choice), 0 otherwise}
#' \item{freepoor}{1 if covered by government because low income,
#' recent immigrant, unemployed, 0 otherwise} \item{freerepa}{1 if
#' covered free by government because of old-age or disability pension, or
#' because invalid veteran or family of deceased veteran, 0 otherwise}
#' \item{illness}{Number of illnesses in past 2 weeks with 5 or more
#' coded as 5} \item{actdays}{Number of days of reduced activity in
#' past two weeks due to illness or injury} \item{hscore}{General
#' health questionnaire score using Goldberg's method.  High score indicates
#' bad health} \item{chcond1}{1 if chronic condition(s) but not limited
#' in activity, 0 otherwise} \item{chcond2}{1 if chronic condition(s)
#' and limited in activity, 0 otherwise} \item{doctorco}{Number of
#' consultations with a doctor or specian the past 2 weeks}
#' \item{nondocco}{Number of consultations with non-doctor health
#' professionals (chemist, optician, physiotherapist, social worker, district
#' community nurse, chiropodist or chiropractor) in the past 2 weeks}
#' \item{hospadmi}{Number of admissions to a hospital, psychiatric
#' hospital, nursing or convalescent home in the past 12 months (up to 5 or
#' more admissions which is coded as 5)} \item{hospdays}{Number of
#' nights in a hospital, etc. during most recent admission: taken, where
#' appropriate, as the mid-point of the intervals 1, 2, 3, 4, 5, 6, 7, 8-14,
#' 15-30, 31-60, 61-79 with 80 or more admissions coded as 80. If no admission
#' in past 12 months then equals zero} \item{medicine}{Total number of
#' prescribed and nonprescribed medications used in past 2 days}
#' \item{prescrib}{Total number of prescribed medications used in past
#' 2 days} \item{nonpresc}{Total number of nonprescribed medications
#' used in past 2 days} }
#' @source Cameron A, Trivedi P, Milne F and Piggot J (1988) A Microeconometric
#' model of the demand for health care and health insurance in Australia,
#' Review of Economic Studies 55, 85-106
#' @keywords datasets
NULL





#' Ecological regression example
#'
#' Relationship between 1998 per capita income dollars from all sources and the
#' proportion of legal state residents born in the United States in 1990 for
#' each of the 50 states plus the District of Columbia
#'
#' @name eco
#' @docType data
#' @format This dataframe contains the following columns \describe{
#' \item{usborn}{ Percentage of population born in the United States}
#' \item{income}{ Per capita annual income in dollars}
#' \item{home}{ Percentage born in state} \item{pop}{
#' Population of state } }
#' @source US Bureau of the Census
#' @keywords datasets
NULL





#' Treatment and block effects on egg production
#'
#' The \code{eggprod} data frame has 12 rows and 3 columns.  Six pullets were
#' placed into each of 12 pens. Four blocks were formed from groups of 3 pens
#' based on location. Three treatments were applied. The number of eggs
#' produced was recorded
#'
#'
#' @name eggprod
#' @docType data
#' @format This data frame contains the following columns: \describe{
#' \item{treat}{ Three treatments: O, E or F } \item{block}{
#' Four blocks labeled 1-4 } \item{eggs}{ Number of eggs produced }}
#' @source Mead, R., R.N. Curnow, and A.M. Hasted. 1993. Statistical Methods in
#' Agriculture and Experimental Biology. Chapman and Hall, London, p. 64. 1993
#' @keywords datasets
NULL





#' Nested data on lab testing of eggs
#'
#' Consistency between laboratory tests is important and yet the results may
#' depend on who did the test and where the test was performed. In an
#' experiment to test levels of consistency, a large jar of dried egg powder
#' was divided up into a number of samples. Because the powder was homogenized,
#' the fat content of the samples is the same, but this fact is withheld from
#' the laboratories. Four samples were sent to each of six laboratories.  Two
#' of the samples were labeled as G and two as H, although in fact they were
#' identical.  The laboratories were instructed to give two samples to two
#' different technicians. The technicians were then instructed to divide their
#' samples into two parts and measure the fat content of each.  So each
#' laboratory reported eight measures, each technician four measures, that is,
#' two replicated measures on each of two samples.
#'
#'
#' @name eggs
#' @docType data
#' @format A data frame with 48 observations on the following 4 variables.
#' \describe{ \item{Fat}{a numeric vector} \item{Lab}{a factor
#' with levels \code{I} \code{II} \code{III} \code{IV} \code{V} \code{VI}}
#' \item{Technician}{a factor with levels \code{one} \code{two}}
#' \item{Sample}{a factor with levels \code{G} \code{H}} }
#' @source Bliss, C. I. (1967). Statistics in Biology. New York: McGraw Hill.
#' @keywords datasets
#' @examples
#'
#' data(eggs)
#' ## maybe str(eggs) ; plot(eggs) ...
#'
NULL





#' Epileptic seizures in clinical trial of drug
#'
#' Data from a clinical trial of 59 epileptics.  For a baseline, patients were
#' observed for 8 weeks and the number of seizures recorded. The patients were
#' then randomized to treatment by the drug Progabide (31 patients) or to the
#' placebo group (28 patients).  They were observed for four 2-week periods and
#' the number of seizures recorded.
#'
#'
#' @name epilepsy
#' @docType data
#' @format A data frame with 295 observations on the following 6 variables.
#' \describe{ \item{seizures}{number of seizures}
#' \item{id}{identifying number} \item{treat}{1=treated, 0=not}
#' \item{expind}{0=baseline period, 1=treatment period}
#' \item{timeadj}{weeks of period} \item{age}{in years} }
#' @references Breslow, N. E. and D. G. Clayton (1993). Approximate inference
#' in generalized linear mixed models. Journal of the American Statistical
#' Association 88, 9-25. Diggle, P. J., P. Heagerty, K. Y. Liang, and S. L.
#' Zeger (2002). Analysis of Longitudinal Data (2 ed.). Oxford: Oxford
#' University Press.
#' @source Thall, P. F. and S. C. Vail (1990). Some covariance models for
#' longitudinal count data with overdispersion. Biometrics 46, 657-671.
#' @keywords datasets
#' @examples
#'
#' data(epilepsy)
#' ## maybe str(epilepsy) ; plot(epilepsy) ...
#'
NULL





#' Complaints about emergency room doctors
#'
#' Data was recorded on 44 doctors working in an emergency service at a
#' hospital to study the factors affecting the number of complaints received.
#'
#'
#' @name esdcomp
#' @docType data
#' @format A data frame with 44 observations on the following 6 variables.
#' \describe{ \item{visits}{the number of patient visits}
#' \item{complaints}{the number of complaints}
#' \item{residency}{is the doctor in residency training \code{N} or
#' \code{Y}} \item{gender}{gender of doctor \code{F} or \code{M}}
#' \item{revenue}{dollars per hour earned by the doctor}
#' \item{hours}{total number of hours worked} }
#' @source Chap T. Le (1998) "Applied Categorical Data Analysis" Wiley
#' @keywords datasets
NULL





#' Simulated non-parametric regression data
#'
#' True function is f(x)=sin^3(2pi x^3).
#'
#'
#' @name exa
#' @docType data
#' @format A data frame with 256 observations on the following 3 variables.
#' \describe{ \item{x}{input} \item{y}{response}
#' \item{m}{true value} }
#' @source Haerdle, W. (1991). Smoothing Techniques with Implementation in S.
#' New York:Springer.
#' @keywords datasets
#' @examples
#'
#' data(exa)
#' ## maybe str(exa) ; plot(exa) ...
#'
NULL





#' Simulated non-parametric regression data
#'
#' True function is f(x)=0
#'
#'
#' @name exb
#' @docType data
#' @format A data frame with 256 observations on the following 3 variables.
#' \describe{ \item{x}{input} \item{y}{response}
#' \item{m}{true value} }
#' @source Haerdle, W. (1991). Smoothing Techniques with Implementation in S.
#' New York:Springer.
#' @keywords datasets
#' @examples
#'
#' data(exa)
#' ## maybe str(exa) ; plot(exa) ...
#'
NULL





#' grading of eye pairs for distance vision
#'
#' A sample of women are rated for the performance of distance vision in each
#' eye.
#'
#'
#' @name eyegrade
#' @docType data
#' @format A data frame with 16 observations on the following 3 variables.
#' \describe{ \item{y}{the observed count} \item{right}{rated vision in the
#' right eye - a factor with levels \code{best} \code{second} \code{third}
#' \code{worst}} \item{left}{rated vision in the left eye - a factor with
#' levels \code{best} \code{second} \code{third} \code{worst}} }
#' @source A. Stuart (1955) A test for homogeneity of the marginal
#' distributions in a two-way classification, Biometrika, 42, 412-416.
#' @keywords datasets
NULL





#' Percentage of Body Fat and Body Measurements
#'
#' Age, weight, height, and 10 body circumference measurements are recorded for
#' 252 men. Each man's percentage of body fat was accurately estimated by an
#' underwater weighing technique.
#'
#'
#' @name fat
#' @docType data
#' @format A data frame with 252 observations on the following 18 variables.
#' \describe{ \item{brozek}{ Percent body fat using Brozek's equation,
#' 457/Density - 414.2} \item{siri}{ Percent body fat using Siri's
#' equation, 495/Density - 450} \item{density}{Density (gm/$cm^3$)}
#' \item{age}{ Age (yrs)} \item{weight}{ Weight (lbs)}
#' \item{height}{ Height (inches)} \item{adipos}{ Adiposity
#' index = Weight/Height$^2$ (kg/$m^2$)} \item{free}{ Fat Free Weight =
#' (1 - fraction of body fat) * Weight, using Brozek's formula (lbs)}
#' \item{neck}{ Neck circumference (cm)} \item{chest}{ Chest
#' circumference (cm)} \item{abdom}{ Abdomen circumference (cm) at the
#' umbilicus and level with the iliac crest} \item{hip}{ Hip
#' circumference (cm)} \item{thigh}{ Thigh circumference (cm)}
#' \item{knee}{ Knee circumference (cm)} \item{ankle}{ Ankle
#' circumference (cm)} \item{biceps}{ Extended biceps circumference
#' (cm)} \item{forearm}{Forearm circumference (cm)}
#' \item{wrist}{ Wrist circumference (cm) distal to the styloid
#' processes} }
#' @source Johnson R. Journal of Statistics Education v.4, n.1 (1996)
#' @keywords datasets
NULL





#' Mortality due to smoking according age group in women
#'
#' In 1972-74, a survey of one in six residents of Whickham, near Newcastle,
#' England was made. Twenty years later, this data recorded in a follow-up
#' study. Only women who are current smokers or who have never smoked are
#' included.
#'
#'
#' @name femsmoke
#' @docType data
#' @format A data frame with 28 observations on the following 4 variables.
#' \describe{ \item{y}{observed count for given combination} \item{smoker}{a
#' factor with levels \code{yes} \code{no}} \item{dead}{a factor with levels
#' \code{yes} \code{no}} \item{age}{a factor with agegroup levels \code{18-24}
#' \code{25-34} \code{35-44} \code{45-54} \code{55-64} \code{65-74} \code{75+}}
#' }
#' @source D. Appleton, J. French, M. Vanderpump (1996) "Ignoring a Covariate:
#' An Example of Simpson's Paradox" American Statistician, 50, 340-341
#' @keywords datasets
NULL





#' Billionaires' wealth and age
#'
#' Fortune magazine publishes a f the world's billionaires each year. The
#' 1992 list includes 233 individuals. Their wealth, age, and geographic
#' location (Asia, Europe, Middle East, United States, and Other) are reported.
#'
#'
#' @name fortune
#' @docType data
#' @format A data frame with 232 observations on the following 3 variables.
#' \describe{
#' \item{wealth}{Billions of dollars}
#' \item{age}{age in years} \item{region}{a factor with levels \code{A}, Asia,
#' \code{E}, Europe, \code{M}, Middle East, \code{O} Other, \code{U} USA} }
#' @source Fortune magazine
#' @keywords datasets
#' @examples
#'
#' data(fortune)
#' ## maybe str(fortune) ; plot(fortune) ...
#'
NULL





#' 1981 French Presidential Election
#'
#' Elections for the French presidency proceed in two rounds. In 1981, there
#' were 10 candidates in the first round. The top two candidates then went on
#' to the second round, which was won by Francois Mitterand over Valery
#' Giscard-d'Estaing. The losers in the first round can gain political favors
#' by urging their supporters to vote for one of the two fina Since
#' voting is private, we cannot know how these votes were transferred, we might
#' hope to infer from the published vote totals how this might have happened.
#' Data is given for vote totals in every fourth department of France:
#'
#' @name fpe
#' @docType data
#' @format This dataframe contains the following columns (vote totals are in
#' thousands) \describe{ \item{list("EI}{ Electeur Inscrits (registered
#' voters)}
#'
#' \item{A}{ Voters for Mitterand in the first round} \item{B}{
#' Voters for Giscard in the first round} \item{C}{ Voters for Chirac
#' in the first round} \item{D}{ Voters for Communists in the first
#' round} \item{E}{ Voters for Ecology party in the first round}
#' \item{F}{ Voters for party F in the first round} \item{G}{
#' Voters for party G in the first round} \item{H}{ Voters for party H
#' in the first round} \item{I}{ Voters for party I in the first round}
#' \item{J}{ Voters for party J in the first round} \item{K}{
#' Voters for party K in the first round} \item{A2}{ Voters for
#' Mitterand in the second round} \item{B2}{ Voters for party Giscard
#' in the second round} \item{N}{ Difference between the number of
#' voters in the second round and in the first round}
#'
#' }
#' @source "The Teaching of Practical Statistics" by C.W. Anderson and R.M.
#' Loynes, Wiley,1987
#' @keywords datasets
NULL





#' Longevity of fruitflies depending on sexual activity and thorax length
#'
#' The \code{fruitfly} data frame has 9 rows and 3 columns.  125 fruitflies
#' were divided randomly into 5 groups of 25 each. The response was the
#' longevity of the fruitfly in days. One group was kept solitary, while
#' another was kept individually with a virgin female each day. Another group
#' was given 8 virgin females per day. As an additional control the fourth and
#' fifth groups were kept with one or eight pregnant females per day.  Pregnant
#' fruitflies will not mate. The thorax length of each male was measured as
#' this was known to affect longevity. One observation in the many group has
#' been lost.
#'
#'
#' @name fruitfly
#' @docType data
#' @format This data frame contains the following columns: \describe{
#'
#' \item{thorax}{ Thorax length } \item{longevity}{ Lifetime in
#' days } \item{activity}{ The group: isolated = fly kept solitary, one
#' = fly kept with one pregnant fruitfly, many = fly kept with eight pregnant
#' fruitflies, low= fly kept with one virgin fruitfly, high = fly kept with
#' eight virgin fruitflies.  }}
#' @source "Sexual Activity and the Lifespan of Male Fruitflies" by L.
#' Partridge and M. Farquhar, Nature, 1981, 580-581
#' @keywords datasets
NULL





#' Species diversity on the Galapagos Islands
#'
#' There are 30 Galapagos islands and 7 variables in the dataset. The
#' relationship between the number of plant species and several geographic
#' variables is of interest. The original dataset contained several missing
#' values which have been filled for convenience. See the \code{galamiss}
#' dataset for the original version.
#'
#'
#' @name gala
#' @docType data
#' @format The dataset contains the following variables \describe{
#' \item{Species}{ the number of plant species found on the island}
#' \item{Endemics}{ the number of endemic species} \item{Area}{
#' the area of the island (km$^2$)} \item{Elevation}{ the highest
#' elevation of the island (m)} \item{Nearest}{ the distance from the
#' nearest island (km)} \item{Scruz}{ the distance from Santa Cruz
#' island (km)} \item{Adjacent}{ the area of the adjacent island
#' (square km)} }
#' @source M. P. Johnson and P. H. Raven (1973) "Species number and endemism:
#' The Galapagos Archipelago revisited" Science, 179, 893-895
#' @keywords datasets
NULL





#' Species diversity on the Galapagos Islands
#'
#' There are 30 Galapagos islands and 7 variables in the dataset. The
#' relationship between the number of plant species and several geographic
#' variables is of interest. This is the original version of the dataset
#' containing missing values.
#'
#'
#' @name galamiss
#' @docType data
#' @format The dataset contains the following variables \describe{
#' \item{Species}{ the number of plant species found on the island}
#' \item{Endemics}{ the number of endemic species} \item{Area}{
#' the area of the island (km$^2$)} \item{Elevation}{ the highest
#' elevation of the island (m)} \item{Nearest}{ the distance from the
#' nearest island (km)} \item{Scruz}{ the distance from Santa Cruz
#' island (km)} \item{Adjacent}{ the area of the adjacent island
#' (square km)} }
#' @source M. P. Johnson and P. H. Raven (1973) "Species number and endemism:
#' The Galapagos Archipelago revisited" Science, 179, 893-895
#' @keywords datasets
NULL





#' Xray decay from a gamma ray burst
#'
#' The X-ray decay light curve of Gamma ray burst 050525a obtained with the
#' X-Ray Telescope (XRT) on board the Swift satellite. The dataset has 63
#' brightness measurements in the 0.4-4.5 keV spectral band at times ranging
#' from 2 minutes to 5 days after the burst.
#'
#'
#' @name gammaray
#' @docType data
#' @format A data frame with 63 observations on the following 3 variables.
#' \describe{ \item{time}{in seconds since burst}
#' \item{flux}{X-ray flux in units of 10^-11 erg/cm2/s, 2-10 keV}
#' \item{error}{measurement error of the flux based on detector
#' signal-to-noise values} }
#' @source A. J. Blustin and 64 coauthors, Astrophys. J. 637, 901-913 2006.
#' Available at http://arxiv.org/abs/astro-ph/0507515.
#' @keywords datasets
#' @examples
#'
#' data(gammaray)
#' ## maybe str(gammaray) ; plot(gammaray) ...
#'
NULL





#' Undercounted votes in Georgia in 2000 presidential election
#'
#' The data comes from the US presidential election in the state of Georgia.
#' The undercount is the difference between the number of ballots cast and
#' votes recorded. Voters may have chosen not to vote for president, voted for
#' more than one candidate (disqualified) or the equipment may have failed to
#' register their choice.
#'
#'
#' @name gavote
#' @docType data
#' @format A data frame with 159 observations on the following 10 variables.
#' Each case represents a county in Georgia.  \describe{
#' \item{equip}{The voting equipment used: \code{LEVER}, \code{OS-CC}
#' (optical, central count), \code{OS-PC} (optical, precinct count)
#' \code{PAPER}, \code{PUNCH}} \item{econ}{economic status of county:
#' \code{middle} \code{poor} \code{rich}} \item{perAA}{percent of
#' African Americans in county} \item{rural}{indicator of whether
#' county is \code{rural} or \code{urban}} \item{atlanta}{indicator of
#' whether county is in \code{Atlanta} or not: \code{notAtlanta}}
#' \item{gore}{number of votes for Gore} \item{bush}{number of
#' votes for Bush} \item{other}{number of votes for other candidates}
#' \item{votes}{number of votes} \item{ballots}{number of
#' ballots} }
#' @source Meyer M. (2002) Uncounted Votes: Does Voting Equipment Matter?
#' Chance, 15(4), 33-38
#' @keywords datasets
NULL





#' Northern Hemisphere temperatures and climate proxies in the last millenia
#'
#' Average Northen Hemisphere Temperature from 1856-2000 and eight climate
#' proxies from 1000-2000AD. Data can be used to predict temperatures prior to
#' 1856.
#'
#' See the source and references below for the original data. Only some proxies
#' have been included here. Some missing values have been imputed. The proxy
#' data have been smoothed. This version of the data is intended only for
#' demonstration purposes. If you are specifically interested in the subject
#' matter, use the original data.
#'
#' @name globwarm
#' @docType data
#' @format A data frame with 1001 observations on the following 10 variables.
#' \describe{ \item{nhtemp}{Northern hemisphere average temperature (C)
#' provided by the UK Met Office (known as HadCRUT2)} \item{wusa}{Tree
#' ring proxy information from the Western USA.} \item{jasper}{Tree
#' ring proxy information from Canada.} \item{westgreen}{Ice core proxy
#' information from west Greenland} \item{chesapeake}{Sea shell proxy
#' information from Chesapeake Bay} \item{tornetrask}{Tree ring proxy
#' information from Sweden} \item{urals}{Tree ring proxy information
#' from the Urals} \item{mongolia}{Tree ring proxy information from
#' Mongolia} \item{tasman}{Tree ring proxy information from Tasmania}
#' \item{year}{Year 1000-2000AD} }
#' @references www.ncdc.noaa.gov/paleo/pubs/jones2004/jones2004.html
#' @source P.D. Jones and M.E. Mann (2004) "Climate Over Past Millennia"
#' Reviews of Geophysics, Vol. 42, No. 2, RG2002, doi:10.1029/2003RG000143
#' @keywords datasets
#' @examples
#'
#' data(globwarm)
#' ## maybe str(globwarm) ; plot(globwarm) ...
#'
NULL





#' Hair and eye color
#'
#' Data collected from 592 students in an introductory statistics class
#'
#'
#' @name haireye
#' @docType data
#' @format A data frame with 16 observations on the following 3 variables.
#' \describe{ \item{y}{count of the number of student with given hair/eye
#' combination} \item{eye}{a factor with levels \code{green} \code{hazel}
#' \code{blue} \code{brown}} \item{hair}{a factor with levels \code{BLACK}
#' \code{BROWN} \code{RED} \code{BLOND}} }
#' @source Snee R. (1974) Graphical display of two-way contingency tables.
#' American Statistician, 28, 9-12
#' @keywords datasets
NULL





#' love, work and happiness
#'
#' Data were collected from 39 students in a University of Chicago MBA class
#'
#'
#' @name happy
#' @docType data
#' @format A data frame with 39 observations on the following 5 variables.
#' \describe{ \item{happy}{Happiness on a 10 point scale where 10 is most
#' happy} \item{money}{family income in thousands of dollars} \item{sex}{1 =
#' satisfactory sexual activity, 0 = not} \item{love}{1 = lonely, 2 = secure
#' relationships, 3 = deep feeling of belonging and caring} \item{work}{5 point
#' scale where 1 = no job, 3 = OK job, 5 = great job} }
#' @source George and McCulloch (1993) "Variable Selection via Gibbs Sampling"
#' JASA, 88, 881-889
#' @keywords datasets
NULL





#' Treatment of insulin dependent diabetic children
#'
#' 16 insulin-dependent diabetic children were enrolled in a study involving a
#' new treatment. 8 children received the new treatment(N) while the other 8
#' received the standard treatment(S).  The age and sex of the child was
#' recorded along with the measured value of gycosolated hemoglobin both before
#' and after treatment.
#'
#'
#' @name hemoglobin
#' @docType data
#' @format A data frame with 16 observations on the following 5 variables.
#' \describe{ \item{age}{age in years} \item{sex}{a factor with
#' levels \code{F} \code{M}} \item{treatment}{a factor with levels
#' \code{N} \code{S}} \item{pre}{measured value of hemoglobin before
#' treatment} \item{post}{measured value of hemoglobin after treatment}
#' }
#' @source Unknown
#' @keywords datasets
#' @examples
#'
#' data(hemoglobin)
#' ## maybe str(hemoglobin) ; plot(hemoglobin) ...
#'
NULL





#' Ankylosing Spondylitis
#'
#' Data from Royal Mineral Hospital in Bath. AS is a chronic form of arthritis.
#' A study conducted to determine whether daily stretching of the hip tissues
#' would improve mobility.  39 ``typical'' AS patients were randomly allocated
#' to control (standard treatment) group or the treatment group in a 1:2 ratio.
#' Responses were flexion and rotation angles at the hip measured in degrees.
#' Larger numbers indicate more flexibility.
#'
#'
#' @name hips
#' @docType data
#' @format A data frame with 78 observations on the following 7 variables.
#' \describe{ \item{fbef}{flexion angle before}
#' \item{faft}{flexion angle after} \item{rbef}{rotation angle
#' before} \item{raft}{rotation angle after}
#' \item{grp}{treatment group - a factor with levels \code{control}
#' \code{treat}} \item{side}{side of the body - a factor with levels
#' \code{right} \code{left}} \item{person}{id for the individual} }
#' @source Chatfield C. (1995) Problem Solving: A Statistician's Guide, 2ed
#' Chapman Hall.
#' @keywords datasets
#' @examples
#'
#' data(hips)
#' ## maybe str(hips) ; plot(hips) ...
#'
NULL





#' Hormone concentrations in gay and straight men
#'
#' Urinary androsterone (androgen) and etiocholanolone (estrogen) values were
#' recorded from 26 healthy males.
#'
#'
#' @name hormone
#' @docType data
#' @format A data frame with 26 observations on the following 3 variables.
#' \describe{ \item{androgen}{concentration}
#' \item{estrogen}{concentration} \item{orientation}{sexual
#' orientation with levels \code{g} \code{s}} }
#' @references Hand, D. (1981). Discrimination and Classification. Chichester,
#' UK: Wiley.
#' @source Margolese, M. (1970). Homosexuality: A new endocrine correlate.
#' Hormones and Behavior 1, 151-155.
#' @keywords datasets
#' @examples
#'
#' data(hormone)
#' ## maybe str(hormone) ; plot(hormone) ...
#'
NULL





#' Housing prices in US cities 86-94
#'
#' Data on housing prices in 36 US metropolitan statistical areas (MSAs) over 9
#' years from 1986-1994 were collected.
#'
#'
#' @name hprice
#' @docType data
#' @format A data frame with 324 observations on the following 8 variables.
#' \describe{ \item{narsp}{natural log average sale price in thousands
#' of dollars} \item{ypc}{average per capita income}
#' \item{perypc}{percentage growth in per capita income}
#' \item{regtest}{Regulatory environment index (high values = more
#' regulations)} \item{rcdum}{Rent control - a factor with levels
#' \code{0}=no \code{1}=yes} \item{ajwtr}{Adjacent to a coastline - a
#' factor with levels \code{0}=no \code{1}=yes} \item{msa}{indicator
#' for the MSA} \item{time}{Year 1=1986 to 9=1994} }
#' @source Longitudinal and Panel Data: Analysis and Applications in the Social
#' Sciences, by Edward W. Frees, Cambridge University Press, August 2004.
#' @keywords datasets
NULL





#' Career choice of high school students
#'
#' Data was collected as a subset of the "High School and Beyond" study
#' conducted by the National Education Longitudinal Studies (NELS) program of
#' the National Center for Education Statistics (NCES).
#'
#' One purpose of the study was to determine which factors are related to the
#' choice of the type of program, academic, vocational or general, that the
#' students pursue in high school.
#'
#' @name hsb
#' @docType data
#' @format A data frame with 200 observations on the following 11 variables.
#' \describe{ \item{id}{ID of student} \item{gender}{a factor
#' with levels \code{female} \code{male}} \item{race}{a factor with
#' levels \code{african-amer} \code{asian} \code{hispanic} \code{white}}
#' \item{ses}{socioeconomic class - a factor with levels \code{high}
#' \code{low} \code{middle}} \item{schtyp}{school type - a factor with
#' levels \code{private} \code{public}} \item{prog}{choice of high
#' school program - a factor with levels \code{academic} \code{general}
#' \code{vocation}} \item{read}{reading score}
#' \item{write}{writing score} \item{math}{math score}
#' \item{science}{science score} \item{socst}{social science
#' score} }
#' @source National Education Longitudinal Studies (NELS) program of the
#' National Center for Education Statistics (NCES).
#' @keywords datasets
NULL





#' Infant mortality according to income and region
#'
#' The \code{infmort} data frame has 105 rows and 4 columns.  The infant
#' mortality in regions of the world may be related to per capita income and
#' whether oil is exported. The dataset is not recent.
#'
#'
#' @name infmort
#' @docType data
#' @format This data frame contains the following columns: \describe{
#'
#' \item{region}{ Region of the world, Africa, Europe, Asia or the
#' Americas } \item{income}{ Per capita annual income in dollars }
#'
#' \item{mortality}{ Infant mortality in deaths per 1000 births }
#' \item{oil}{ Does the country export oil or not?  } }
#' @source Unknown
#' @keywords datasets
NULL





#' Effects of insulation on gas consumption
#'
#' Data on natural gas usage in a house.  The weekly gas consumption (in 1000
#' cubic feet) and the average outside temperature (in degrees Celsius) was
#' recorded for 26 weeks before and 30 weeks after cavity-wall insulation had
#' been installed.  The house thermostat was set at 20C throughout.
#'
#'
#' @name insulgas
#' @docType data
#' @format A data frame with 44 observations on the following 3 variables.
#' \describe{ \item{Insulate}{a factor with levels \code{After}
#' \code{Before}} \item{Temp}{Outside temperature}
#' \item{Gas}{Weekly consumption in 1000 cubic feet} }
#' @source MASS package as whiteside
#' @keywords datasets
#' @examples
#'
#' data(insulgas)
#' ## maybe str(insulgas) ; plot(insulgas) ...
#'
NULL





#' Irrigation methods in an agricultural field trial
#'
#' In an agricultural field trial, the objective was to determine the effects
#' of two crop varieties and four different irrigation methods. Eight fields
#' were available, but only one type of irrigation may be applied to each
#' field. The fields may be divided into two parts with a different variety
#' planted in each half.  The whole plot factor is the method of irrigation,
#' which should be randomly assigned to the fields. Within each field, the
#' variety is randomly assigned.
#'
#'
#' @name irrigation
#' @docType data
#' @format A data frame with 16 observations on the following 4 variables.
#' \describe{ \item{field}{a factor with levels \code{f1} \code{f2}
#' \code{f3} \code{f4} \code{f5} \code{f6} \code{f7} \code{f8}}
#' \item{irrigation}{a factor with levels \code{i1} \code{i2} \code{i3}
#' \code{i4}} \item{variety}{a factor with levels \code{v1} \code{v2}}
#' \item{yield}{a numeric vector} }
#' @source Found online but source not recorded.
#' @keywords datasets
#' @examples
#'
#' data(irrigation)
#' ## maybe str(irrigation) ; plot(irrigation) ...
#'
NULL





#' Junior School Project
#'
#' Junior School Project collected from primary (U.S. term is elementary)
#' schools in inner London.
#'
#'
#' @name jsp
#' @docType data
#' @format A data frame with 3236 observations on the following 9 variables.
#' \describe{ \item{school}{50 schools code 1-50}
#' \item{class}{a factor with levels \code{1} \code{2} \code{3}
#' \code{4}} \item{gender}{a factor with levels \code{boy} \code{girl}}
#' \item{social}{class of the father I=1; II=2; III nonmanual=3; III
#' manual=4; IV=5; V=6; Long-term unemployed=7; Not currently employed=8;
#' Father absent=9} \item{raven}{test score} \item{id}{student
#' id coded 1-1402} \item{english}{score on English}
#' \item{math}{score on Maths} \item{year}{year of school} }
#' @references Goldstein, H. (1995). Multilevel Statistical Models (2 ed.).
#' London: Arnold.
#' @source Mortimore, P., P. Sammons, L. Stoll, D. Lewis, and R. Ecob (1988).
#' School Matters. Wells, UK: Open Books.
#' @keywords datasets
#' @examples
#'
#' data(jsp)
#' ## maybe str(jsp) ; plot(jsp) ...
#'
NULL





#' Kangaroo skull measurements
#'
#' Sex and species of an specimens of kangaroo.
#'
#'
#' @name kanga
#' @docType data
#' @format A data frame with 148 observations on the following 20 variables.
#' \describe{ \item{species}{a factor with levels \code{fuliginosus}
#' \code{giganteus} \code{melanops}} \item{sex}{a factor with levels
#' \code{Female} \code{Male}} \item{basilar.length}{a numeric vector}
#' \item{occipitonasal.length}{a numeric vector}
#' \item{palate.length}{a numeric vector} \item{palate.width}{a
#' numeric vector} \item{nasal.length}{a numeric vector}
#' \item{nasal.width}{a numeric vector}
#' \item{squamosal.depth}{a numeric vector}
#' \item{lacrymal.width}{a numeric vector}
#' \item{zygomatic.width}{a numeric vector}
#' \item{orbital.width}{a numeric vector}
#' \item{.rostral.width}{a numeric vector}
#' \item{occipital.depth}{a numeric vector}
#' \item{crest.width}{a numeric vector}
#' \item{foramina.length}{a numeric vector}
#' \item{mandible.length}{a numeric vector}
#' \item{mandible.width}{a numeric vector}
#' \item{mandible.depth}{a numeric vector}
#' \item{ramus.height}{a numeric vector} }
#' @references Andrews, D. F. and Herzberg, A. M. (1985). Data.
#' Springer-Verlag, New York.
#' @source Andrews and Herzberg (1985) Chapter 53.
#' @keywords datasets
#' @examples
#'
#' data(kanga)
#' ## maybe str(kanga) ; plot(kanga) ...
#'
NULL





#' Cut-off times of lawnmowers
#'
#' Data on the cut-off times of lawnmowers was collected. 3 machines were
#' randomly selected from those produced by manufacturers A and B. Each machine
#' was tested twice at low speed and high speed.
#'
#'
#' @name lawn
#' @docType data
#' @format A data frame with 24 observations on the following 4 variables.
#' \describe{ \item{manufact}{Manufacturer - a factor with levels
#' \code{A} \code{B}} \item{machine}{Lawn mower - a factor with levels
#' \code{m1} \code{m2} \code{m3} \code{m4} \code{m5} \code{m6}}
#' \item{speed}{Speed of testing - a factor with levels \code{H}
#' \code{L}} \item{time}{cut-off time} }
#' @source Unknown.
#' @keywords datasets
NULL





#' Leaf blotch on barley
#'
#' The data gives the proportion of leaf area affected by leaf blotch on 10
#' varieties of barley at 9 different sites.
#'
#'
#' @name leafblotch
#' @docType data
#' @format A data frame with 90 observations on the following 3 variables.
#' \describe{ \item{blotch}{proportion of the barley leaf affected by
#' blotch} \item{site}{the physical location - a factor with levels
#' \code{1} \code{2} \code{3} \code{4} \code{5} \code{6} \code{7} \code{8}
#' \code{9}} \item{variety}{variety of barley - a factor with levels
#' \code{1} \code{2} \code{3} \code{4} \code{5} \code{6} \code{7} \code{8}
#' \code{9} \code{10}} }
#' @references P. McCullagh and J. Nelder (1989) "Generalized Linear Models"
#' Chapman and Hall, 2nd ed.
#' @source R. W. M. Wedderburn (1974) "Quasilikelihood functions, generalized
#' linear models and the Gauss-Newton method" Biometrika, 61, 439-447.
#' @keywords datasets
NULL





#' Data on the burning time of samples of tobacco leaves
#'
#' Data on the burning time of samples of tobacco leaves
#'
#'
#' @name leafburn
#' @docType data
#' @format A data frame with 30 observations on the following 4 variables.
#' \describe{ \item{nitrogen}{nitrogen content by percentage weight}
#' \item{chlorine}{chlorine content by percentage weight}
#' \item{potassium}{potassium content by percentage weight}
#' \item{burntime}{burn time in seconds} }
#' @source Steel, R. G. D. and Torrie, J. H. (1980), Principles and Procedures
#' of Statistics, Second Edition, New York: McGraw-Hill
#' @keywords datasets
NULL





#' Sleep in Mammals: Ecological and Constitutional Correlates
#'
#' The \code{mammalsleep} data frame has 62 rows and 10 columns. Sleep in
#' Mammals: Ecological and Constitutional Correlates
#'
#'
#' @name mammalsleep
#' @docType data
#' @format This data frame contains the following columns: \describe{
#' \item{body}{ body weight in kg } \item{brain}{ brain weight
#' in g } \item{nondream}{ slow wave ("nondreaming") sleep (hrs/day) }
#' \item{dream}{ paradoxical ("dreaming") sleep (hrs/day) }
#' \item{sleep}{ total sleep (hrs/day) (sum of slow wave and
#' paradoxical sleep) } \item{lifespan}{ maximum life span (years) }
#' \item{gestation}{ gestation time (days) } \item{predation}{
#' predation index (1-5) 1 = minimum (least likely to be preyed upon) to 5 =
#' maximum (most likely to be preyed upon) } \item{exposure}{ sleep
#' exposure index (1-5) 1 = least exposed (e.g. animal sleeps in a
#' well-protected den) 5 = most exposed } \item{danger}{ overall danger
#' index (1-5) (based on the above two indices and other information) 1 = least
#' danger (from other animals) 5 = most danger (from other animals) }}
#' @source "Sleep in Mammals: Ecological and Constitutional Correlates" by
#' Allison, T.  and Cicchetti, D. (1976), Science, November 12, vol. 194, pp.
#' 732-734.
#' @keywords datasets
NULL





#' Mayer's 1750 data on the Manilius crater on the moon
#'
#' In 1750, Tobias Mayer collected data on various landmarks on the moon in
#' order to determine its orbit. The data involving the position of the
#' Manilius crater resulted in a least squares like problem. The example is
#' discussed in Steven Stigler's History of Statistics.
#'
#' See Stigler for a detailed description.
#'
#' @name manilius
#' @docType data
#' @format A data frame with 27 observations on the following 4 variables.
#' \describe{ \item{arc}{an angle known as h in Stigler's notation}
#' \item{sinang}{the sin(g-k) where g and k are two angles in Stigler}
#' \item{cosang}{the cos(g-k) where g and k are two angles in Stigler}
#' \item{group}{one of three groups determined by Mayer} }
#' @references Mayer, T. (1750) Abhandlung uber die Umwaltzung des Monds um
#' seine Axe und die scheinbare Bewegung der Mondsflecken published in the
#' Kosmographische Nachrichten und Sammlungen auf das Jahr 1748. 52-183
#' @source Stigler, S. (1986) History of Statistics. Belknap Press, Harvard.
#' @keywords datasets
#' @examples
#'
#' data(manilius)
#'
NULL





#' Meat spectrometry to determine fat content
#'
#' A Tecator Infratec Food and Feed Analyzer working in the wavelength range
#' 850 - 1050 nm by the Near Infrared Transmission (NIT) principle was used to
#' collect data on samples of finely chopped pure meat. 215 samples were
#' measured. For each sample, the fat content was measured along with a 100
#' channel spectrum of absorbances. Since determining the fat content via
#' analytical chemistry is time consuming we would like to build a model to
#' predict the fat content of new samples using the 100 absorbances which can
#' be measured more easily.
#'
#' @name meatspec
#' @docType data
#' @format Dataset contains the following variables \describe{
#' \item{V1-V100}{ absorbances across a range of 100 wavelengths }
#' \item{fat}{ fat content} }
#' @source H. H. Thodberg (1993) "Ace of Bayes: Application of Neural Networks
#' With Pruning", report no. 1132E, Maglegaardvej 2, DK-4000 Roskilde, Danmark
#' @keywords datasets
NULL





#' Melanoma by type and location
#'
#' Data comes from a study of Malignant Melanoma involving 400 subjects.
#'
#'
#' @name melanoma
#' @docType data
#' @format A data frame with 12 observations on the following 3 variables.
#' \describe{ \item{count}{number of cases} \item{tumor}{type
#' of tumor - a factor with levels \code{freckle} \code{indeterminate}
#' \code{nodular} \code{superficial}} \item{site}{location of tumor on
#' the body - a factor with levels \code{extremity} \code{head} \code{trunk}} }
#' @source Dobson A. (2002) An introduction to generalized linear models,
#' Chapman Hall.
#' @keywords datasets
NULL





#' Third party motor insurance claims in Sweden in 1977
#'
#' In Sweden all motor insurance companies apply identical risk arguments to
#' classify customers, and thus their portfolios and their claims statistics
#' can be combined. The data were compiled by a Swedish Committee on the
#' Analysis of Risk Premium in Motor Insurance. The Committee was asked to look
#' into the problem of analyzing the real influence on claims of the risk
#' arguments and to compare this structure with the actual tariff.
#'
#'
#' @name motorins
#' @docType data
#' @format A data frame with 1797 observations on the following 8 variables.
#' \describe{ \item{Kilometres}{an ordered factor representing kilomoters per
#' year with levels 1: < 1000, 2: 1000-15000, 3: 15000-20000, 4: 20000-25000,
#' 5: > 25000} \item{Zone}{a factor representing geographical area with levels
#' 1: Stockholm, Goteborg, Malmo with surroundings 2: Other large cities with
#' surroundings 3: Smaller cities with surroundings in southern Sweden 4: Rural
#' areas in southern Sweden 5: Smaller cities with surroundings in northern
#' Sweden 6: Rural areas in northern Sweden 7: Gotland} \item{Bonus}{No claims
#' bonus. Equal to the number of years, plus one, since last claim}
#' \item{Make}{A factor representing eight different common car models. All
#' other models are combined in class 9} \item{Insured}{Number of insured in
#' policy-years} \item{Claims}{Number of claims} \item{Payment}{Total value of
#' payments in Skr} \item{perd}{payment per claim} }
#' @references Hallin, M., and Ingenbleek, J.-F. (1983). The Swedish automobile
#' portfolio in 1977. A statistical study. Scandinavian Actuarial Journal,
#' 49-64.
#' @source \url{http://www.statsci.org/data/general/motorins.html}
#' @keywords datasets
NULL





#' Questionnaire study of neighborly help
#'
#' Subjects were asked questions in a study of neighborly help. Questions below
#' are a subset of the full study.
#'
#' Exeter is a city in the county of Devon which is in Britain.  The four
#' districts can be briefly described as follows. District 1 was a
#' long-established residential area near the city centre, with housing dating
#' from the late nineteenth century. Originally working class, it now has a
#' considerable middle class population with some student and other temporary
#' accommodation. District 2 was a working-class housing estate dating from the
#' 1930s, with mainly rented accommodation but some owner occupation. District
#' 3 was the oldest part of a more recently developed, mainly middle-class,
#' almost exclusively owner-occupied estate, dating from the 1960s. District 4
#' was the most recently developed part of a more sought-after middle-class
#' residential area, with smaller but almost entirely owner-occupied properties
#' dating from the 1970s and 1980s.
#'
#' @name neighbor
#' @docType data
#' @format A data frame with 181 observations on the following 8 variables.
#' \describe{ \item{longlive}{About how long have you lived where you do now?
#' Ans is a factor with levels \code{<6mos} \code{6-12mos} \code{1-3yrs}
#' \code{3-10yrs} \code{10yrs}} \item{wherebfr}{Where were you living before
#' you moved to your present house? Ans is a factor with levels \code{same}
#' \code{Exeter} \code{Devon} \code{Britain} \code{Abroad}} \item{hownbly}{How
#' neighborly do you think the area where you now live is? Ans is a factor with
#' levels \code{Vunfriendly} \code{NVfriendly} \code{Average} \code{FFriendly}
#' \code{VFriendly}} \item{knowname}{Roughly how many people in your street, or
#' in the streets just near you, do you know the names of? Ans is a factor with
#' levels \code{none} \code{1-5} \code{6-20} \code{20+}} \item{callname}{How
#' many of those people (not counting children) would you call by their first
#' names? Ans is a factor with levels \code{none} \code{1-5} \code{6-20}
#' \code{20+}} \item{age}{a factor with levels \code{-18} \code{18-30}
#' \code{31-50} \code{51-65} \code{65+}} \item{district}{a factor with levels
#' \code{1} \code{2} \code{3} \code{4}} \item{sex}{a factor with levels
#' \code{female} \code{male}} }
#' @source P. Webley & S. Lea 1993, Human Relations 46, 65-76.
#' @keywords datasets
NULL





#' National Education Longitudinal Study of 1988
#'
#' A subset of the National Education Longitudinal Study of 1988
#'
#'
#' @name nels88
#' @docType data
#' @format A data frame with 260 observations on the following 5 variables.
#' \describe{ \item{sex}{a factor with levels \code{Female}
#' \code{Male}} \item{race}{a factor with levels \code{White}
#' \code{Asian} \code{Black} \code{Hispanic}} \item{ses}{a numeric
#' vector} \item{paredu}{a factor with levels \code{ba} \code{college}
#' \code{hs} \code{lesshs} \code{ma} \code{phd}} \item{math}{a numeric
#' vector} }
#' @source \url{http://www.icpsr.umich.edu/icpsrweb/ICPSR/series/107}
#' @keywords datasets
#' @examples
#'
#' data(nels88)
#' ## maybe str(nels88) ; plot(nels88) ...
#'
NULL





#' Nepali child heath study
#'
#' The data are a subset from public health study on Nepalese children.
#'
#'
#' @name nepali
#' @docType data
#' @format A data frame with 1000 observations on the following 9 variables.
#' \describe{ \item{id}{There is a six digit code for the child's ID: 2
#' digits for the panchayat number; 2 digits for the ward within panchayat; 1
#' digits for the household; 1 digit for child within household.}
#' \item{sex}{1 = male; 2 = female} \item{wt}{Child's weight
#' measured in kilograms} \item{ht}{Child's height measured in
#' centimeters} \item{mage}{Mother's age in years}
#' \item{lit}{Indicator of mother's literacy: 0 = no; 1 = yes}
#' \item{died}{The number of children the mother has had that died.}
#' \item{alive}{The number of children the mother has ever had born
#' alive} \item{age}{age of child} }
#' @source West KP, Jr., LeClerq SC, Shrestha SR, Wu LS, Pradhan EK, Khatry SK,
#' Katz J, Adhikari R, Sommer A.  Effects of vitamin A on growth of vitamin A
#' deficient children: field studies in Nepal. J Nutr 1997;10:1957-1965.
#' @keywords datasets
NULL





#' US 1996 national election study
#'
#' 10 variable subset of the 1996 American National Election Study. Missing
#' values and "don't know" responses have been se deleted. Respondents
#' expressing a voting preference other than Clinton or Dole have been removed.
#'
#'
#' @name nes96
#' @docType data
#' @format A data frame with 944 observations on the following 10 variables.
#' \describe{ \item{popul}{population of respondent's location in 1000s of
#' people} \item{TVnews}{days in the past week spent watching news on TV}
#' \item{selfLR}{Left-Right self-placement of respondent: an ordered factor
#' with levels extremely liberal, \code{extLib} < liberal, \code{Lib} <
#' slightly liberal, \code{sliLib} < moderate, \code{Mod} < slightly
#' conservative, \code{sliCon} < conservative, \code{Con} < extremely
#' conservative, \code{extCon}} \item{ClinLR}{Left-Right placement of Bill
#' Clinton (same scale as selfLR): an ordered factor with levels \code{extLib}
#' < \code{Lib} < \code{sliLib} < \code{Mod} < \code{sliCon} < \code{Con} <
#' \code{extCon}} \item{DoleLR}{Left-Right placement of Bob Dole (same scale as
#' selfLR): an ordered factor with levels \code{extLib} < \code{Lib} <
#' \code{sliLib} < \code{Mod} < \code{sliCon} < \code{Con} < \code{extCon}}
#' \item{PID}{Party identification: an ordered factor with levels strong
#' Democrat, \code{strDem} < weak Democrat, \code{weakDem} < independent
#' Democrat, \code{indDem} < independent independent\code{indind} < indepedent
#' Republican, \code{indRep} < waek Republican, \code{weakRep} < strong
#' Republican, \code{strRep}} \item{age}{Respondent's age in years}
#' \item{educ}{Respondent's education: an ordered factor with levels 8 years or
#' less, \code{MS} < high school dropout, \code{HSdrop} < high school diploma
#' or GED, \code{HS} < some College, \code{Coll} < Community or junior College
#' degree, \code{CCdeg} < BA degree, \code{BAdeg} < postgraduate degree,
#' \code{MAdeg}} \item{income}{Respondent's family income: an ordered factor
#' with levels \code{$3Kminus} < \code{$3K-$5K} < \code{$5K-$7K} <
#' \code{$7K-$9K} < \code{$9K-$10K} < \code{$10K-$11K} < \code{$11K-$12K} <
#' \code{$12K-$13K} < \code{$13K-$14K} < \code{$14K-$15K} < \code{$15K-$17K} <
#' \code{$17K-$20K} < \code{$20K-$22K} < \code{$22K-$25K} < \code{$25K-$30K} <
#' \code{$30K-$35K} < \code{$35K-$40K} < \code{$40K-$45K} < \code{$45K-$50K} <
#' \code{$50K-$60K} < \code{$60K-$75K} < \code{$75K-$90K} < \code{$90K-$105K} <
#' \code{$105Kplus}} \item{vote}{Expected vote in 1996 presidential election: a
#' factor with levels \code{Clinton} and \code{Dole}} }
#' @references Found at \url{http://www.stat.washington.edu/}
#' @source Sapiro, Virginia, Steven J. Rosenstone, Donald R. Kinder, Warren E.
#' Miller, and the National Election Studies. AMERICAN NATIONAL ELECTION
#' STUDIES, 1992-1997: COMBINED FILE [Computer file]. 2nd ICPSR version. Ann
#' Arbor, MI: University of Michigan, Center for Political Studies [producer],
#' 1999. Ann Arbor, MI: Inter-university Consortium for Political and Social
#' Research [distributor], 1999.
#' @keywords datasets
NULL





#' New Hampshire Democratic Party Primary 2008
#'
#' Votes and other demographic information from 276 wards in the 2008
#' Democratic Party presidential primary.
#'
#' On the 8th January 2008, primaries to select US presidential candidates were
#' held in New Hampshire. In the Democratic party primary, Hillary Clinton
#' defeated Barack Obama contrary to the expectations pre-election opinion
#' polls. Essentially two different voting technologies were used in New
#' Hampshire. Some wards used paper ballots, counted by hand while others used
#' optically scanned ballots, counted by machine. Among the paper ballots,
#' Obama had more votes than Clinton while Clinton defeated Obama on just the
#' machine counted ballots. Since the method of voting should make no causal
#' difference to the outcome, suspicions have been raised regarding the
#' integrity of the election.
#'
#' @name newhamp
#' @docType data
#' @format A data frame with 276 observations on the following 12 variables.
#' \describe{ \item{votesys}{The voting system used where H is counted
#' by hand and D is counted by machine.} \item{Obama}{The number of
#' votes for Barack Obama.} \item{Clinton}{The number of votes for
#' Hillary Clinton.} \item{dem}{The total number of votes cast in the
#' Democratic primary (there were other candidates besides Clinton and Obama).}
#' \item{povrate}{The poverty rate as a proportion as determined by the
#' 2000 census.} \item{pci}{Per capita annual income in USD in 1999.}
#' \item{Dean}{The proportion of voters for Howard Dean in the 2004
#' Democratic primary.} \item{Kerry}{The proportion of voters for John
#' Kerry in the 2004 Democratic primary.} \item{white}{The proportion
#' of non-Hispanic whites according to the 2000 census.}
#' \item{absentee}{The proportion voting by absentee ballot.}
#' \item{population}{An estimate of the population from 2002.}
#' \item{pObama}{Proportion voting for Obama} }
#' @source Herron, M., W. M. Jr, and J. Wand (2008). Voting Technology and the
#' 2008 New Hampshire Primary. Wm. & Mary Bill Rts. J. 17, 351-374.
#' @keywords datasets
NULL





#' Yields of oat varieties planted in blocks
#'
#' Data from an experiment to compare 8 varieties of oats. The growing area was
#' heterogeneous and so was grouped into 5 blocks. Each variety was sown once
#' within each block and the yield in grams per 16ft row was recorded.
#'
#'
#' @name oatvar
#' @docType data
#' @format The dataset contains the following variables \describe{
#' \item{yield}{ Yield in grams per 16ft row } \item{block}{
#' Blocks I to V } \item{variety}{ Variety 1 to 8} }
#' @source "Statistical Theory in Research" by R. Anderson and T. Bancroft,
#' McGraw Hill,1952
#' @keywords datasets
NULL





#' Odor of chemical by production settings
#'
#' Data from an experiment to determine the effects of column temperature,
#' gas/liquid ratio and packing height in reducing unpleasant odor of chemical
#' product that was being sold for household use
#'
#'
#' @name odor
#' @docType data
#' @format \describe{ \item{odor}{ Odor score} \item{temp}{
#' Temperature coded as -1, 0 and 1} \item{gas}{ Gas/Liquid ratio coded
#' as -1, 0 and 1} \item{pack}{ Packing height coded as -1, 0 and 1} }
#' @source "Statistical Design and Analysis of Experiments" by P. John,
#' Macmillan, 1971
#' @keywords datasets
NULL





#' Ohio Children Wheeze Status
#'
#' The \code{ohio} data frame has 2148 rows and 4 columns. The dataset is a
#' subset of the six-city study, a longitudinal study of the health effects of
#' air pollution.
#'
#'
#' @name ohio
#' @docType data
#' @format This data frame contains the following columns: \describe{
#' \item{resp}{an indicator of wheeze status (1=yes, 0=no)} \item{id}{a numeric
#' vector for subject id} \item{age}{a numeric vector of age, 0 is 9 years old}
#' \item{smoke}{an indicator of maternal smoking at the first year of the
#' study} }
#' @references Fitzmaurice, G.M. and Laird, N.M. (1993) A likelihood-based
#' method for analyzing longitudinal binary responses, \emph{Biometrika}
#' \bold{80}: 141--151.
#' @keywords datasets
NULL





#' Spache Shuttle Challenger O-rings
#'
#' The 1986 crash of the space shuttle Challenger was linked to failure of
#' O-ring seals in the rocket engines. Data was collected on the 23 previous
#' shuttle missions. The launch temperature on the day of the crash was 31F.
#'
#'
#' @name orings
#' @docType data
#' @format A data frame with 23 observations on the following 2 variables.
#' \describe{ \item{temp}{temperature at launch in degrees F}
#' \item{damage}{number of damage incidents out of 6 possible} }
#' @references S. Dalal, E. Fowlkes and B. Hoadley (1989) "Risk Analysis of the
#' Space Shuttle: Pre-Challenger Prediction of Failure." Journal of the
#' American Statistical Association. 84: 945-957.
#' @source Presidential Commission on the Space Shuttle Challenger Accident,
#' Vol. 1, 1986: 129-131.
#' @keywords datasets
NULL





#' Ozone in LA in 1976
#'
#' A study the relationship between atmospheric ozone concentration and
#' meteorology in the Los Angeles Basin in 1976.  A number of cases with
#' missing variables have been removed for simplicity.
#'
#'
#' @name ozone
#' @docType data
#' @format A data frame with 330 observations on the following 10 variables.
#' \describe{ \item{O3}{Ozone conc., ppm, at Sandbug AFB.}
#' \item{vh}{a numeric vector} \item{wind}{wind speed}
#' \item{humidity}{a numeric vector} \item{temp}{temperature}
#' \item{ibh}{inversion base height} \item{dpg}{Daggett
#' pressure gradient} \item{ibt}{a numeric vector}
#' \item{vis}{visibility} \item{doy}{day of the year} }
#' @source Breiman, L. and J. H. Friedman (1985). Estimating optimal
#' transformations for multiple regression and correlation. Journal of the
#' American Statistical Association 80, 580-598.
#' @keywords datasets
#' @examples
#'
#' data(ozone)
#' ## maybe str(ozone) ; plot(ozone) ...
#'
NULL





#' Marijuana and parent alcohol and drug use
#'
#' 445 college students were classified according to both frequency of
#' marijuana use and parental use of alcohol and psychoactive drugs.
#'
#'
#' @name parstum
#' @docType data
#' @format A data frame with 9 observations on the following 3 variables.
#' \describe{ \item{parent}{Number of parents using drugs or alcohol -
#' a factor with levels \code{Both} \code{Neither} \code{One}}
#' \item{student}{Student usage of marijuana - a factor with levels
#' \code{Never} \code{Occasional} \code{Regular}} \item{count}{the
#' number of cases} }
#' @source Ellis, Godfrey J. and Stone, Lorene H.  (1979) Marijuana Use in
#' College: "An Evaluation of a Modeling Explanation" Youth and Society 10,
#' 323-34
#' @keywords datasets
NULL





#' Carbon dioxide effects on peanut oil extraction
#'
#' The \code{peanut} data frame has 16 rows and 6 columns. Carbon dioxide
#' effects on peanut oil extraction
#'
#'
#' @name peanut
#' @docType data
#' @format This data frame contains the following columns: \describe{
#' \item{press}{ CO2 pressure - two levels low=0, high=1 }
#' \item{temp}{ CO2 temperature - two levels low=0, high=1 }
#' \item{moist}{ peanut moisture - two levels low=0, high=1 }
#' \item{flow}{ CO2 flow rate - two levels low=0, high=1 }
#' \item{size}{ peanut particle size - two levels low=0, high=1 }
#' \item{solubility}{ the amount of oil that could dissolve in the CO2
#' }}
#' @source Kilgo, M (1989) "An Application of Fractional Factorial Experimental
#' Designs" Quality Engineering, 1, 45-54
#' @keywords datasets
NULL





#' Penicillin yield by block and treatment
#'
#' The production of penicillin uses a raw material, corn steep liquor, is
#' quite variable and can only be made in blends sufficient for four runs.
#' There are four processes, A, B, C and D, for the production.
#'
#'
#' @name penicillin
#' @docType data
#' @format A data frame with 20 observations on the following 3 variables.
#' \describe{ \item{treat}{a factor with levels \code{A} \code{B}
#' \code{C} \code{D}} \item{blend}{a factor with levels \code{Blend1}
#' \code{Blend2} \code{Blend3} \code{Blend4} \code{Blend5}}
#' \item{yield}{a numeric vector} }
#' @source Box, G., W. Hunter, and J. Hunter (1978). Statistics for
#' Experimenters. New York: Wiley.
#' @keywords datasets
#' @examples
#'
#' data(penicillin)
#' ## maybe str(penicillin) ; plot(penicillin) ...
#'
NULL





#' Birth weights in Philadelphia
#'
#' Data based on a 5% sample of all births occurring in Philadelphia in 1990.
#'
#'
#' @name phbirths
#' @docType data
#' @format A data frame with 1115 observations on the following 5 variables.
#' \describe{ \item{black}{is the mother Black?}
#' \item{educ}{mother's years of education} \item{smoke}{does
#' the mother smoke during pregnancy?} \item{gestate}{gestational age
#' in weeks} \item{grams}{birth weight in grams} }
#' @source I. T. Elo, G. Rodriguez and H. Lee (2001). Racial and Neighborhood
#' Disparities in Birthweight in Philadelphia. Paper presented at the Annual
#' Meeting of the Population Association of America, Washington, DC 2001.
#' @keywords datasets
#' @examples
#'
#' data(phbirths)
#' ## maybe str(phbirths) ; plot(phbirths) ...
#'
NULL





#' Diabetes survey on Pima Indians
#'
#' The National Institute of Diabetes and Digestive and Kidney Diseases
#' conducted a study on 768 adult female Pima Indians living near Phoenix.
#'
#'
#' @name pima
#' @docType data
#' @format The dataset contains the following variables \describe{
#' \item{pregnant}{ Number of times pregnant} \item{glucose}{
#' Plasma glucose concentration at 2 hours in an oral glucose tolerance test}
#' \item{diastolic}{ Diastolic blood pressure (mm Hg)}
#' \item{triceps}{ Triceps skin fold thickness (mm)}
#' \item{insulin}{ 2-Hour serum insulin (mu U/ml)} \item{bmi}{
#' Body mass index (weight in kg/(height in metres squared))}
#' \item{diabetes}{ Diabetes pedigree function} \item{age}{ Age
#' (years)} \item{test}{ test whether the patient shows signs of
#' diabetes (coded 0 if negative, 1 if positive)} }
#' @source The data may be obtained from UCI Repository of machine learning
#' databases at \url{http://archive.ics.uci.edu/ml/}
#' @keywords datasets
NULL





#' NIST data on ultrasonic measurements of defects in the Alaska pipeline
#'
#' Researchers at National Institutes of Standards and Technology (NIST)
#' collected data on ultrasonic measurements of the depths of defects in the
#' Alaska pipeline in the field. The depth of the defects were then remeasured
#' in the laboratory. These measurements were performed in six different
#' batches. The laboratory measurements are more accurate than the in-field
#' measurements, but more time consuming and expensive.
#'
#'
#' @name pipeline
#' @docType data
#' @format A data frame with 107 observations on the following 3 variables.
#' \describe{ \item{Field}{measurement of depth of defect on site}
#' \item{Lab}{measurement of depth of defect in the lab} \item{Batch}{the batch
#' of measurements} }
#' @source Office of the Director of the Institute of Materials Research (now
#' the Materials Science and Engineering Laboratory) of NIST
#' @keywords datasets
NULL





#' Pneumonoconiosis in coal miners
#'
#' The data for this example contains the number of coal miners classified by
#' radiological examination into one of three categories of
#' pneumonoultramicroscopicosilicovolcanoconiosis (known as pneumonoconiosis
#' for short) and by number of years spent working at the coal face divided
#' into eight categories.
#'
#'
#' @name pneumo
#' @docType data
#' @format A data frame with 24 observations on the following 3 variables.
#' \describe{ \item{Freq}{number of miners}
#' \item{status}{pneumoconiosis status - a factor with levels
#' \code{mild} \code{normal} \code{severe}} \item{year}{number of years
#' service (midpoint of interval)} }
#' @source M. Aitkin and D. Anderson and B. Francis and J. Hinde (1989)
#' "Statistical Modelling in GLIM" Oxford University Press.
#' @keywords datasets
NULL





#' Marijuana usage by youth
#'
#' The National Youth Survey collected a sample of 11 to 17 year olds - 117
#' boys and 120 girls - asking questions about marijuana usage.
#'
#'
#' @name potuse
#' @docType data
#' @format A data frame with 486 observations on the following 7 variables.
#' \describe{ \item{sex}{1=Male, 2=Female}
#' \item{year.76}{1=never used, 2=used no more than once a month,
#' 3=used more than once a month in 1976} \item{year.77}{1=never used,
#' 2=used no more than once a month, 3=used more than once a month in 1977}
#' \item{year.78}{1=never used, 2=used no more than once a month,
#' 3=used more than once a month in 1978} \item{year.79}{1=never used,
#' 2=used no more than once a month, 3=used more than once a month in 1979}
#' \item{year.80}{1=never used, 2=used no more than once a month,
#' 3=used more than once a month in 1980} \item{count}{Number of cases
#' in this category} }
#' @references Lang J., McDonald, J and Smith P. (1999) "Association-Marginal
#' Modeling of Mutlivariate Categorical Responses: A Maximum Likelihood
#' Approach" JASA 94, 1161-
#' @source ICPSR, University of Michigan
#' @keywords datasets
NULL





#' Prostate cancer surgery
#'
#' The \code{prostate} data frame has 97 rows and 9 columns. A study on 97 men
#' with prostate cancer who were due to receive a radical prostatectomy.
#'
#'
#' @name prostate
#' @docType data
#' @format This data frame contains the following columns: \describe{
#'
#' \item{lcavol}{ log(cancer volume) } \item{lweight}{
#' log(prostate weight) }
#'
#' \item{age}{ age } \item{lbph}{ log(benign prostatic
#' hyperplasia amount) } \item{svi}{ seminal vesicle invasion }
#' \item{lcp}{ log(capsular penetration) } \item{gleason}{
#' Gleason score } \item{pgg45}{ percentage Gleason scores 4 or 5 }
#' \item{lpsa}{ log(prostate specific antigen) } }
#' @source Andrews DF and Herzberg AM (1985): Data. New York: Springer-Verlag
#' @keywords datasets
NULL





#' Panel Study of Income Dynamics subset
#'
#' The Panel Study of Income Dynamics (PSID), begun in 1968, is a longitudinal
#' study of a representative sample of U.S. individuals.  The study is
#' conducted at the Survey Research Center, Institute for Social Research,
#' University of Michigan and is still continuing. The data represents a small
#' subset of the total data.
#'
#'
#' @name psid
#' @docType data
#' @format A data frame with 1661 observations on the following 6 variables.
#' \describe{ \item{age}{age in 1968} \item{educ}{years of
#' education} \item{sex}{sex of individual, \code{F} or \code{M}}
#' \item{income}{annual income in dollars} \item{year}{calendar
#' year} \item{person}{ID number for individual} }
#' @source Martha S. Hill, The Panel Study of Income Dynamics: A User's Guide,
#' Sage Publications, 1992,Newbury Park, CA.
#' @keywords datasets
NULL





#' Brightness of paper pulp depending on shift operator
#'
#' The \code{pulp} data frame has 20 rows and 2 columns. Data comes from an
#' experiment to test the paper brightness depending on a shift operator.
#'
#'
#' @name pulp
#' @docType data
#' @format This data frame contains the following columns: \describe{
#' \item{bright}{ Brightness of the pulp as measured by a reflectance
#' meter } \item{operator}{ Shift operator a-d }}
#' @source "Statistical techniques applied to production situations" F. Sheldon
#' (1960) Industrial and Engineering Chemistry, 52, 507-509
#' @keywords datasets
NULL





#' Leg strength and punting
#'
#' Investigators studied physical characteristics and ability in 13 (American)
#' football punters. Each volunteer punted a football ten times. The
#' investigators recorded the average distance for the ten punts, in feet.
#'
#'
#' @name punting
#' @docType data
#' @format A data frame with 13 observations on the following 7 variables.
#' \describe{ \item{Distance}{average distance over 10 punts}
#' \item{Hang}{hang time} \item{RStr}{right leg strength in
#' pounds} \item{LStr}{left leg strength in pounds}
#' \item{RFlex}{right hamstring muscle flexibility in degrees}
#' \item{LFlex}{left hamstring muscle flexibility in degrees}
#' \item{OStr}{overall leg strength in foot pounds} }
#' @source Unknown
#' @keywords datasets
#' @examples
#'
#' data(punting)
#' ## maybe str(punting) ; plot(punting) ...
#'
NULL





#' Production of PVC by operator and resin railcar
#'
#' Data from an experiment to study factors affecting the production of the
#' plastic PVC, 3 operators used 8 different devices called resin railcars to
#' produce PVC. For each of the 24 combinations, two samples were produced.
#'
#'
#' @name pvc
#' @docType data
#' @format Dataset contains the following variables \describe{
#' \item{psize}{ Particle size} \item{operator}{ Operator
#' number 1, 2 or 3} \item{resin}{ Resin railcar 1-8} }
#' @source R. Morris and E. Watson (1998) "A comparison of the techniques used
#' to evaluate the measurement process" Quality Engineering, 11, 213-219
#' @keywords datasets
NULL





#' Activity in pyrimidines
#'
#' Structural information on 74 2,4-diamino- 5-(substituted benzyl) pyrimidines
#' used as inhibitors of DHFR in E. coli. There are 3 positions where chemical
#' activity occurs and 9 attributes per position leading to 27 total
#' predictors. One predictor had no variability and was removed from the data
#' set. 26 chemical properties of 74 compounds and an activity level
#'
#'
#' @name pyrimidines
#' @docType data
#' @format A data frame with 74 observations on the following 27 variables.
#' \describe{ \item{p1.polar}{measured on a [0,1] scale}
#' \item{p1.size}{measured on a [0,1] scale}
#' \item{p1.flex}{measured on a [0,1] scale}
#' \item{p1.h.doner}{measured on a [0,1] scale}
#' \item{p1.h.acceptor}{measured on a [0,1] scale}
#' \item{p1.pi.doner}{measured on a [0,1] scale}
#' \item{p1.pi.acceptor}{measured on a [0,1] scale}
#' \item{p1.polarisable}{measured on a [0,1] scale}
#' \item{p1.sigma}{measured on a [0,1] scale}
#' \item{p2.polar}{measured on a [0,1] scale}
#' \item{p2.size}{measured on a [0,1] scale}
#' \item{p2.flex}{measured on a [0,1] scale}
#' \item{p2.h.doner}{measured on a [0,1] scale}
#' \item{p2.h.acceptor}{measured on a [0,1] scale}
#' \item{p2.pi.doner}{measured on a [0,1] scale}
#' \item{p2.pi.acceptor}{measured on a [0,1] scale}
#' \item{p2.polarisable}{measured on a [0,1] scale}
#' \item{p2.sigma}{measured on a [0,1] scale}
#' \item{p3.polar}{measured on a [0,1] scale}
#' \item{p3.size}{measured on a [0,1] scale}
#' \item{p3.flex}{measured on a [0,1] scale}
#' \item{p3.h.doner}{measured on a [0,1] scale}
#' \item{p3.h.acceptor}{measured on a [0,1] scale}
#' \item{p3.pi.doner}{measured on a [0,1] scale}
#' \item{p3.polarisable}{measured on a [0,1] scale}
#' \item{p3.sigma}{measured on a [0,1] scale}
#' \item{activity}{log 1/Ki, where Ki is the inhibition constant as
#' experimentally assayed, scaled to [0,1]} }
#' @source Jonathan D. Hirst, Ross D. King, Michael J. E. Sternberg (1994)
#' Quantitative structure-activity relationships by neural networks and
#' inductive logic programming. I. The inhibition of dihydrofolate reductase by
#' pyrimidines \doi{10.1007/BF00125375}
#' @keywords datasets
#' @examples
#'
#' data(pyrimidines)
#' ## maybe str(pyrimidines) ; plot(pyrimidines) ...
#'
NULL





#' Rabbit weight gain by diet and litter
#'
#' A nutritionist studied the effects of six diets, on weight gain of domestic
#' rabbits.  From past experience with sizes of litters, it was felt that only
#' 3 uniform rabbits could be selected from each available litter. There were
#' ten litters available forming blocks of size three.
#'
#'
#' @name rabbit
#' @docType data
#' @format The variables in the dataset were \describe{ \item{treat}{
#' Diet a through f}
#'
#' \item{gain}{ Weight gain}
#'
#' \item{block}{ Block (10 litters)} }
#' @source "Experimental Design and Analysis" by M. Lentner and T. Bishop,
#' Valley Book Company, 1986
#' @keywords datasets
NULL





#' Rat growth weights affected by additives
#'
#' The data consist of 5 weekly measurements of body weight for 27 rats. The
#' first 10 rats are on a control treatment while 7 rats have thyroxine added
#' to their drinking water. 10 Rats have thiouracil added to their water.
#'
#'
#' @name ratdrink
#' @docType data
#' @format A data frame with 135 observations on the following 4 variables.
#' \describe{ \item{wt}{Weight of the rat} \item{weeks}{Week of
#' the study from 0 to 4} \item{subject}{the rat code number}
#' \item{treat}{treatment applied to the rat drinking water - a factor
#' with levels \code{control} \code{thiouracil} \code{thyroxine}} }
#' @source Unknown
#' @keywords datasets
NULL





#' Effect of toxic agents on rats
#'
#' An experiment was conducted as part of an investigation to combat the
#' effects of certain toxic agents.
#'
#'
#' @name rats
#' @docType data
#' @format A data frame with 48 observations on the following 3 variables.
#' \describe{ \item{time}{survival time in tens of hours}
#' \item{poison}{the poison type - a factor with levels \code{I}
#' \code{II} \code{III}} \item{treat}{the treatment - a factor with
#' levels \code{A} \code{B} \code{C} \code{D}} }
#' @source Box G and Cox D (1964) "An analysis of transformations" J. Roy.
#' Stat. Soc. Series B. \bold{26} 211.
#' @keywords datasets
NULL





#' Shape and plate effects on current noise in resistors
#'
#' The \code{resceram} data frame has 12 rows and 3 columns. Shape and plate
#' effects on current noise in resistors
#'
#'
#' @name resceram
#' @docType data
#' @format This data frame contains the following columns: \describe{
#' \item{noise}{ current noise } \item{shape}{ the geometrical
#' shape of the resistor, A, B, C or D } \item{plate}{ the ceramic
#' plate on which the resistor was mounted. Only three resistors will fit on
#' one plate. }}
#' @source Natrella, M (1963) "Experimental Statistics" National Bureau of
#' Standards Handbook 91, Gaithersburg MD.
#' @keywords datasets
NULL





#' Salmonella reverse mutagenicity assay
#'
#' The data was collected in a salmonella reverse mutagenicity assay where the
#' numbers of revertant colonies of TA98 Salmonella observed on each of three
#' replicate plates for different doses of quinoline
#'
#'
#' @name salmonella
#' @docType data
#' @format A data frame with 18 observations on the following 2 variables.
#' \describe{ \item{colonies}{numbers of revertant colonies of TA98 Salmonella}
#' \item{dose}{dose level of quinoline} }
#' @source Breslow N.E. (1984), Extra-Poisson Variation in Log-linear Models,
#' ApplStat, pp. 38-44.
#' @keywords datasets
NULL





#' School expenditure and test scores from USA in 1994-95
#'
#' The \code{sat} data frame has 50 rows and 7 columns.  Data were collected to
#' study the relationship between expenditures on public education and test
#' results.
#'
#'
#' @name sat
#' @docType data
#' @format This data frame contains the following columns: \describe{
#' \item{expend}{ Current expenditure per pupil in average daily
#' attendance in public elementary and secondary schools, 1994-95 (in thousands
#' of dollars) } \item{ratio}{ Average pupil/teacher ratio in public
#' elementary and secondary schools, Fall 1994 } \item{salary}{
#' Estimated average annual salary of teachers in public elementary and
#' secondary schools, 1994-95 (in thousands of dollars) }
#' \item{takers}{ Percentage of all eligible students taking the SAT,
#' 1994-95 } \item{verbal}{ Average verbal SAT score, 1994-95 }
#' \item{math}{ Average math SAT score, 1994-95 } \item{total}{
#' Average total score on the SAT, 1994-95 } }
#' @source "Getting What You Pay For: The Debate Over Equity in Public School
#' Expenditures" D. Guber, Journal of Statistics Education, 1999
#' @keywords datasets
NULL





#' Savings rates in 50 countries
#'
#' The \code{savings} data frame has 50 rows and 5 columns.  The data is
#' averaged over the period 1960-1970.
#'
#' Now also appears as \code{LifeCycleSavings} in the \code{datasets} package
#'
#'
#' @name savings
#' @docType data
#' @format This data frame contains the following columns: \describe{
#'
#' \item{sr}{savings rate - personal saving divided by disposable
#' income}
#'
#' \item{pop15}{percent population under age of 15}
#'
#' \item{pop75}{percent population over age of 75}
#'
#' \item{dpi}{per-capita disposable income in dollars}
#'
#' \item{ddpi}{percent growth rate of dpi} }
#' @source Belsley, D., Kuh. E. and Welsch, R. (1980) "Regression Diagnostics"
#' Wiley.
#' @keywords datasets
#' @seealso LifeCycleSavings
NULL





#' Car seat position depending driver size
#'
#' Car drivers like to adjust the seat position for their own comfort. Car
#' designers would find it helpful to know where different drivers will
#' position the seat depending on their size and age. Researchers at the
#' HuMoSim laboratory at the University of Michigan collected data on 38
#' drivers.
#'
#'
#' @name seatpos
#' @docType data
#' @format The dataset contains the following variables \describe{
#' \item{Age}{ Age in years} \item{Weight}{ Weight in lbs}
#' \item{HtShoes}{ Height in shoes in cm} \item{Ht}{ Height
#' bare foot in cm} \item{Seated}{ Seated height in cm}
#' \item{Arm}{ lower arm length in cm} \item{Thigh}{ Thigh
#' length in cm} \item{Leg}{ Lower leg length in cm}
#' \item{hipcenter}{ horizontal distance of the midpoint of the hips
#' from a fixed location in the car in mm} }
#' @source "Linear Models in R" by Julian Faraway, CRC Press, 2004
#' @keywords datasets
NULL





#' Germination of seeds depending on moisture and covering
#'
#' A Biologist analyzed an experiment to determine the effect of moisture
#' content on seed germination. Eight boxes of 100 seeds each were treated with
#' the same moisture level. 4 boxes were covered and 4 left uncovered. The
#' process was repeated at 6 different moisture levels (nonlinear scale).
#'
#'
#' @name seeds
#' @docType data
#' @format A data frame with 48 observations on the following 3 variables.
#' \describe{ \item{germ}{percentage germinated}
#' \item{moisture}{moisture level} \item{covered}{a factor with
#' levels \code{no} \code{yes}} }
#' @source Chatfield C. (1995) Problem Solving: A Statistician's Guide, 2ed
#' Chapman Hall.
#' @keywords datasets
#' @examples
#'
#' data(seeds)
#' ## maybe str(seeds) ; plot(seeds) ...
#'
NULL





#' Semiconductor split-plot experiment
#'
#' The \code{semicond} data frame has 48 rows and 5 columns.
#'
#' Also found in the \code{SASmixed} package
#'
#' @name semicond
#' @docType data
#' @format This data frame contains the following columns: \describe{
#' \item{resistance}{ a numeric vector } \item{ET}{ a factor with levels
#' \code{1} to \code{4} representing etch time.  } \item{Wafer}{ a factor with
#' levels \code{1} to \code{3} } \item{position}{ a factor with levels \code{1}
#' to \code{4} } \item{Grp}{ an ordered factor with levels \code{1/1} <
#' \code{1/2} < \code{1/3} < \code{2/1} < \code{2/2} < \code{2/3} < \code{3/1}
#' < \code{3/2} < \code{3/3} < \code{4/1} < \code{4/2} < \code{4/3} } }
#' @source Littel, R. C., Milliken, G. A., Stroup, W. W., and Wolfinger, R. D.
#' (1996), \emph{SAS System for Mixed Models}, SAS Institute (Data Set 2.2(b)).
#' @keywords datasets
NULL





#' Post traumatic stress disorder in abused adult females
#'
#' The data for this example come from a study of the effects of childhood
#' sexual abuse on adult females. 45 women being treated at a clinic, who
#' reported childhood sexual abuse, were measured for post traumatic stress
#' disorder and childhood physical abuse both on standardized scales. 31 women
#' also being treated at the same clinic, who did not report childhood sexual
#' abuse were also measured. The full study was more complex than reported here
#' and so readers interested in the subject matter should refer to the original
#' article.
#'
#'
#' @name sexab
#' @docType data
#' @format The variables in the dataset are \describe{
#'
#' \item{cpa}{ Childhood physical abuse on standard scale}
#'
#' \item{ptsd}{ Post-traumatic stress disorder on standard scale}
#' \item{csa}{ Childhood sexual abuse - abused or not abused} }
#' @source N. Rodriguez and S. Ryan and H. Vande Kemp and D. Foy (1997)
#' "Postraumatic stress disorder in adult female survivors of childhood sexual
#' abuse: A comparison study", Journal of Consulting and Clinical Pyschology,
#' 65, 53-59
#' @keywords datasets
NULL





#' Marital sex ratings
#'
#' Data from a questionaire from 91 couples in the Tucson, Arizona area.
#' Subjects answered the question "Sex is fun for me and my partner". The
#' possible answers were "never or occasionally","fairly often","very often"
#' and "almost always"
#'
#'
#' @name sexfun
#' @docType data
#' @format A data frame with 16 observations on the following 3 variables.
#' \describe{ \item{y}{the count} \item{husband}{a factor with levels
#' \code{never} \code{fairly} \code{very} \code{always}} \item{wife}{a factor
#' with levels \code{never} \code{fairly} \code{very} \code{always}} }
#' @source Hout, M., Duncan, O. and Sobel M. (1987) Association and
#' heterogeneity: Structural models of similarities and differences.
#' Sociological Methods. 17, 145-184.
#' @keywords datasets
NULL





#' Snail production
#'
#' A study was conducted to optimize snail production for consumption. The
#' percentage water content of the tissues of snails grown under three
#' different levels of relative humidity and two different temperatures was
#' recorded. For each combination, 4 snails were observed.
#'
#'
#' @name snail
#' @docType data
#' @format A data frame with 24 observations on the following 3 variables.
#' \describe{ \item{water}{percentage water content}
#' \item{temp}{temperature in C} \item{humid}{relative
#' humidity} }
#' @source Unknown
#' @keywords datasets
#' @examples
#'
#' data(snail)
#' ## maybe str(snail) ; plot(snail) ...
#'
NULL





#' Solder skips in printing circuit boards
#'
#' ATT ran an experiment varying five factors relevant to a wave-soldering
#' procedure for mounting components on printed circuit boards.  The response
#' variable, skips, is a count of how many solder skips appeared to a visual
#' inspection.
#'
#'
#' @name solder
#' @docType data
#' @format A data frame with 900 observations on the following 6 variables.
#' \describe{ \item{Opening}{a factor with levels \code{L} \code{M}
#' \code{S}} \item{Solder}{a factor with levels \code{Thick}
#' \code{Thin}} \item{Mask}{a factor with levels \code{A1.5} \code{A3}
#' \code{A6} \code{B3} \code{B6}} \item{PadType}{a factor with levels
#' \code{D4} \code{D6} \code{D7} \code{L4} \code{L6} \code{L7} \code{L8}
#' \code{L9} \code{W4} \code{W9}} \item{Panel}{a numeric vector}
#' \item{skips}{count of how many solder skips appeared to a visual
#' inspection} }
#' @source Comizzoli, R. B., J. M. Landwehr, and J. D. Sinclair (1990). Robust
#' materials and processes: Key to reliability. AT&T Technical Journal 69(6),
#' 113-128.
#' @keywords datasets
#' @examples
#'
#' data(solder)
#' ## maybe str(solder) ; plot(solder) ...
#'
NULL





#' Sonoluminescence
#'
#' The \code{sono} data frame has 16 rows and 8 columns.  Sonoluminescence is
#' the process of turning sound energy into light.  An experiment was conducted
#' to study factors affecting this process.
#'
#'
#' @name sono
#' @docType data
#' @format This data frame contains the following columns: \describe{
#' \item{Intensity}{ Sonoluminescent light intensity }
#' \item{Molarity}{ Amount of Solute. The coding is "low" for 0.10 mol
#' and "high" for 0.33 mol. } \item{Solute}{ Solute type. The coding is
#' "low" for sugar and "high" for glycerol. } \item{pH}{ The coding is
#' "low" for 3 and "high" for 11. } \item{Gas}{ Gas type in water. The
#' coding is "low" for helium and "high" for air. } \item{Water}{ Water
#' depth. The coding is "low" for half and "high" for full. }
#' \item{Horn}{ Horn depth. The coding is "low" for 5 mm and "high" for
#' 10 mm. } \item{Flask}{ Flask clamping. The coding is "low" for
#' unclamped and "high" for clamped. }}
#' @source Eva Wilcox and Ken Inn of the NIST Physics Laboratory conducted this
#' experiment during 1999 and published in NIST/SEMATECH e-Handbook of
#' Statistical Methods, http://www.itl.nist.gov/div898/handbook/
#' @keywords datasets
NULL





#' Germination failures for soybean seeds
#'
#' An experiment was conducted to compare the germination rates of the five
#' varieties of soybean. Five plots were available.
#'
#'
#' @name soybean
#' @docType data
#' @format A data frame with 25 observations on the following 3 variables.
#' \describe{ \item{variety}{the variety - a factor with levels
#' \code{arasan} \code{check} \code{fermate} \code{semesan} \code{spergon}}
#' \item{replicate}{the plot - a factor with levels \code{1} \code{2}
#' \code{3} \code{4} \code{5}} \item{failure}{the number of failures
#' out of 100 planted seeds} }
#' @source Snedecor G. and Cochran W. (1967) Statistical Methods (6th Ed) Iowa
#' State University Press
#' @keywords datasets
NULL





#' Teaching methods in Economics
#'
#' A study to determine the effectiveness of a new teaching method in Economics
#'
#'
#' @name spector
#' @docType data
#' @format A data frame with 32 observations on the following 4 variables.
#' \describe{ \item{grade}{1 = exam grades improved, 0 = not improved}
#' \item{psi}{1 = student exposed to PSI (a new teach method), 0 = not exposed}
#' \item{tuce}{a measure of ability when entering the class} \item{gpa}{grade
#' point average} }
#' @source Spector, L. and Mazzeo, M. (1980), "Probit Analysis and Economic
#' Education", Journal of Economic Education, 11, 37 - 44.
#' @keywords datasets
NULL





#' Speedometer cable shrinkage
#'
#' Speedometer cables can be noisy because of shrinkage in the plastic casing
#' material. An experiment was conducted to find out what caused shrinkage by
#' screening a large number of factors.  The engineers started with 15
#' different factors.
#'
#'
#' @name speedo
#' @docType data
#' @format The dataset contains the following variables: (variables a-o are 2
#' level factors, coded "+" and "-" where "+" indicates a higher value where
#' appropriate) \describe{ \item{a}{ liner outer diameter}
#' \item{b}{ liner die} \item{c}{ liner material}
#' \item{d}{ liner line speed} \item{e}{ wire braid type}
#' \item{f}{ braiding tension} \item{g}{ wire diameter}
#' \item{h}{ liner tension} \item{i}{ liner temperature}
#' \item{j}{ coating material} \item{k}{ coating die type}
#' \item{l}{ melt temperature} \item{m}{ screen pack}
#' \item{n}{ cooling method} \item{o}{ line speed}
#' \item{y}{ percentage shrinkage per specimen} }
#' @source G. P. Box and S. Bisgaard and C. Fung (1988) "An explanation and
#' critque of Taguchi's contributions to quality engineering", Quality and
#' reliability engineering international, 4, 123-131
#' @keywords datasets
NULL





#' Star temperatures and light intensites
#'
#' Data on the log of the surface temperature and the log of the light
#' intensity of 47 stars in the star cluster CYG OB1, which is in the direction
#' of Cygnus,
#'
#'
#' @name star
#' @docType data
#' @format A data frame with 47 observations on the following 3 variables.
#' \describe{ \item{index}{a numeric vector}
#' \item{temp}{temperature} \item{light}{light intensity} }
#' @source Rousseeuw, P. and A. Leroy (1987). Robust Regression and Outlier
#' Detection. New York: Wiley.
#' @keywords datasets
#' @examples
#'
#' data(star)
#' ## maybe str(star) ; plot(star) ...
#'
NULL





#' Marks in a statistics class
#'
#' Marks from Statistics 500 one year at the University of Michigan
#'
#'
#' @name stat500
#' @docType data
#' @format A data frame with 55 observations on the following 4 variables.
#' \describe{ \item{midterm}{a numeric vector} \item{final}{a
#' numeric vector} \item{hw}{a numeric vector} \item{total}{a
#' numeric vector} }
#' @source Julian Faraway
#' @keywords datasets
#' @examples
#'
#' data(stat500)
#' ## maybe str(stat500) ; plot(stat500) ...
#'
NULL





#' Stepping and effect on heart rate
#'
#' An experiment was conducted to explore the nature of the relationship
#' between a person's heart rate and the frequency at which that person stepped
#' up and down on steps of various heights.
#'
#'
#' @name stepping
#' @docType data
#' @format A data frame with 30 observations on the following 6 variables.
#' \describe{ \item{Order}{running order within the experiment}
#' \item{Block}{Experimenter used} \item{Height}{0 if step at
#' the low (5.75in) height, 1 if at the high (11.5in) height}
#' \item{Frequency}{the rate of stepping. 0 if slow (14 steps/min), 1
#' if medium (21 steps/min), 2 if high (28 steps/min)}
#' \item{RestHR}{the resting heart rate of the subject before a trial,
#' in beats per minute} \item{HR}{ the final heart rate of the subject
#' after a trial, in beats per minute } }
#' @source Unknown
#' @keywords datasets
#' @examples
#'
#' data(stepping)
#' ## maybe str(stepping) ; plot(stepping) ...
#'
NULL





#' Strong interaction experiment data
#'
#' Example Dataset from "Practical Regression and Anova"
#'
#' @name strongx
#' @docType data
#' @format \describe{Dataframe with 10 cases
#' \item{momentum}{}
#' \item{energy}{inverse total energy}
#' \item{crossx}{Scattering cross-section/sec}
#' \item{sd}{standard deviation}}
#' @references Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
#' @source Weisberg, H., Beier, H., Brody, H., Patton, R., Raychaudhari, K., Takeda, H., Thern, R. and Van Berg, R. (1978). s-dependence of proton fragmentation by hadrons. II. Incident laboratory momenta, 30–250 GeV/c. Physics Review D, 17, 2875–2887.
#' @keywords datasets
NULL





#' Suicide method data from the UK
#'
#' One year of suicide data from the United Kingdom crossclassified by sex, age
#' and method.
#'
#'
#' @name suicide
#' @docType data
#' @format A data frame with 36 observations on the following 4 variables.
#' \describe{ \item{y}{number of people} \item{cause}{method
#' used - a factor with levels \code{drug} (suicide by solid or liquid matter),
#' \code{gas}, \code{gun} (guns, knives or explosives) \code{hang} (hanging,
#' strangling, suffocating or drowning, \code{jump} \code{other}}
#' \item{age}{a factor with levels \code{m} (middle-aged) \code{o}
#' (old) \code{y} (young) } \item{sex}{a factor with levels \code{f}
#' \code{m}} }
#' @source Everitt B. & Dunn G. (1991) "Applied Multivariate Data Analysis"
#' Edward Arnold
#' @keywords datasets
NULL










#' Study of teenage gambling in Britain
#'
#' The \code{teengamb} data frame has 47 rows and 5 columns. A survey was
#' conducted to study teenage gambling in Britain.
#'
#'
#' @name teengamb
#' @docType data
#' @format This data frame contains the following columns: \describe{
#' \item{sex}{ 0=male, 1=female } \item{status}{ Socioeconomic
#' status score based on parents' occupation } \item{income}{ in pounds
#' per week } \item{verbal}{ verbal score in words out of 12 correctly
#' defined } \item{gamble}{ expenditure on gambling in pounds per year
#' }}
#' @source Ide-Smith & Lea, 1988, Journal of Gambling Behavior, 4, 110-118
#' @keywords datasets
NULL





#' Toenail infection treatment study
#'
#' The data come from a Multicenter study comparing two oral treatments for
#' toenail infection. Patients were evaluated for the degree of separation of
#' the nail. Patients were randomized into two treatments and were followed
#' over seven visits - four in the first year and yearly thereafter. The
#' patients have not been treated prior to the first visit so this should be
#' regarded as the baseline.
#'
#'
#' @name toenail
#' @docType data
#' @format A data frame with 1908 observations on the following 5 variables.
#' \describe{ \item{ID}{ID of patient} \item{outcome}{0=none or
#' mild seperation, 1=moderate or severe } \item{treatment}{the
#' treatment A=0 or B=1} \item{month}{time of the visit (not exactly
#' monthly intervals hence not round numbers)} \item{visit}{the number
#' of the visit} }
#' @references Lesaffre, E. and Spiessens, B. (2001). On the effect of the
#' number of quadrature points in a logistic random-effects model: An example.
#' Journal of the Royal Statistical Society, Series C, 50, 325-335.
#'
#' G. Fitzmaurice, N. Laird and J. Ware (2004) Applied Longitudinal Analysis,
#' Wiley
#' @source De Backer, M., De Vroey, C., Lesaffre, E., Scheys, I., and De
#' Keyser, P. (1998). Twelve weeks of continuous oral therapy for toenail
#' onychomycosis caused by dermatophytes: A double-blind comparative trial of
#' terbinafine 250 mg/day versus itraconazole 200 mg/day. Journal of the
#' American Academy of Dermatology, 38, 57-63.
#' @keywords datasets
NULL





#' Survival of trout eggs depending on time and location
#'
#' Boxes of trout eggs were buried at five different stream locations and
#' retrieved at 4 different times. The number of surviving eggs was recorded.
#' The box was not returned to the stream.
#'
#'
#' @name troutegg
#' @docType data
#' @format A data frame with 20 observations on the following 4 variables.
#' \describe{ \item{survive}{the number of surviving eggs}
#' \item{total}{the number of eggs in the box}
#' \item{location}{the location in the stream with levels \code{1}
#' \code{2} \code{3} \code{4} \code{5}} \item{period}{the number of
#' weeks after placement that the box was withdrawn levels \code{4} \code{7}
#' \code{8} \code{11}} }
#' @references Hinde J. and Demetrio C. (1988) Overdispersion: Models and
#' estimation. Computational Statistics and Data Analysis. 27, 151-170.
#' @source Manly B. (1978) Regression models for proportions with extraneous
#' variance. Biometrie-Praximetrie, 18, 1-18.
#' @keywords datasets
NULL





#' Truck leaf spring experiment
#'
#' Data on an experiment concerning the production of leaf springs for trucks.
#' A \eqn{2^{5-1}} fractional factorial experiment with 3 replicates was
#' carried out with objective of recommending production settings to achieve a
#' free height as close as possible to 8 inches.
#'
#'
#' @name truck
#' @docType data
#' @format A data frame with 48 observations on the following 6 variables.
#' \describe{ \item{B}{furnace temperature - a factor with levels
#' \code{+} \code{-}} \item{C}{heating time - a factor with levels
#' \code{+} \code{-}} \item{D}{transfer time - a factor with levels
#' \code{+} \code{-}} \item{E}{hold-down time - a factor with levels
#' \code{+} \code{-}} \item{O}{quench oil temperature - a factor with
#' levels \code{+} \code{-}} \item{height}{leaf spring free height in
#' inches} }
#' @references P. McCullagh and J. Nelder (1989) "Generalized Linear Models"
#' Chapman and Hall, 2nd ed.
#' @source J. J. Pignatiello and J. S. Ramberg (1985) Contribution to
#' discussion of offline quality control, parameter design and the Taguchi
#' method, Journal of Quality Technology, \bold{17} 198-206.
#' @keywords datasets
NULL





#' Incubation temperature and the sex of turtles
#'
#' Incubation temperature can affect the sex of turtles. There are 3
#' independent replicates for each temperature.
#'
#' @name turtle
#' @docType data
#' @format A data frame with 15 observations on the following 3 variables.
#' \describe{ \item{temp}{temperature in degrees centigrade} \item{male}{number
#' of male turtles hatched} \item{female}{number of female turtles hatched} }
#' @source Beyond Traditional Statistical Methods Copyright 2000 D. Cook, P.
#' Dixon, W. M. Duckworth, M. S. Kaiser, K. Koehler, W. Q. Meeker and W. R.
#' Stephenson. Developed as part of NSF/ILI grant DUE9751644.
#' @keywords datasets
#' @examples
#'
#' data(turtle)
#'
NULL





#' Life, TVs and Doctors
#'
#' Life expectancy, doctors and televisions collected on 38 countries in 1993
#'
#'
#' @name tvdoctor
#' @docType data
#' @format A data frame with 38 observations on the following 3 variables.
#' \describe{ \item{life}{Life expectancy in years}
#' \item{tv}{Number of people per television set}
#' \item{doctor}{Number of people per doctor} }
#' @source Unknown, data for illustration purposes only
#' @keywords datasets
#' @examples
#'
#' data(tvdoctor)
#' ## maybe str(tvdoctor) ; plot(tvdoctor) ...
#'
NULL





#' Twin IQs from Burt
#'
#' Study of IQ in twins reared apart
#'
#'
#' @name twins
#' @docType data
#' @format A dataframe with the following variables: \describe{
#' \item{Foster}{IQ of the fostered child}
#' \item{Biological}{IQ of the biological child}
#' \item{Social}{social class of natural parents}}
#' @references Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
#' @source Burt, C. (1966). The genetic estimation of differences in intelligence:
#' A study of monozygotic twins reared together and apart. Br. J. Psych., 57, 147-153.
#' @keywords datasets
NULL





#' UNC student opinions about the Vietnam War
#'
#' A student newspaper conducted a survey of student opinions about the Vietnam
#' War in May 1967. Responses were classified by sex, year in the program and
#' one of four opinions. The survey was voluntary.
#'
#'
#' @name uncviet
#' @docType data
#' @format A data frame with 40 observations on the following 4 variables.
#' \describe{ \item{y}{the count} \item{policy}{a factor with
#' levels \code{A} (defeat power of North Vietnam by widespread bombing and
#' land invasion) \code{B} (follow the present policy) \code{C} (withdraw
#' troops to strong points and open negotiations on elections involving the
#' Viet Cong) \code{D} (immediate withdrawal of all U.S. troops)}
#' \item{sex}{a factor with levels \code{Female} \code{Male}}
#' \item{year}{a factor with levels \code{Fresh} \code{Grad}
#' \code{Junior} \code{Senior} \code{Soph}} }
#' @source M. Aitkin and D. Anderson and B. Francis and J. Hinde (1989)
#' "Statistical Modelling in GLIM" Oxford University Press.
#' @keywords datasets
NULL





#' Weekly wages of US male workers in 1988
#'
#' The \code{uswages} data frame has 2000 rows and 10 columns. Weekly Wages for
#' US male workers sampled from the Current Population Survey in 1988.
#'
#'
#' @name uswages
#' @docType data
#' @format This data frame contains the following columns: \describe{
#' \item{wage}{ Real weekly wages in dollars (deflated by personal
#' consumption expenditures - 1992 base year)
#'
#' } \item{educ}{ Years of education } \item{exper}{ Years of
#' experience } \item{race}{ 1 if Black, 0 if White (other races not in
#' sample) } \item{smsa}{ 1 if living in Standard Metropolitan
#' Statistical Area, 0 if not } \item{ne}{ 1 if living in the North
#' East } \item{mw}{ 1 if living in the Midwest } \item{we}{ 1
#' if living in the West } \item{so}{ 1 if living in the South }
#' \item{pt}{ 1 if working part time, 0 if not } }
#' @source Bierens, H.J., and D. Ginther (2001): "Integrated Conditional Moment
#' Testing of Quantile Regression Models", Empirical Economics 26, 307-324
#' @keywords datasets
NULL





#' Acuity of vision in response to light flash
#'
#' The acuity of vision for seven subjects was tested. The response is the lag
#' in milliseconds between a light flash and a response in the cortex of the
#' eye. Each eye is tested at four different powers of lens. An object at the
#' distance of the second number appears to be at distance of the first number.
#'
#'
#' @name vision
#' @docType data
#' @format A data frame with 56 observations on the following 4 variables.
#' \describe{ \item{acuity}{a numeric vector} \item{power}{a
#' factor with levels \code{6/6} \code{6/18} \code{6/36} \code{6/60}}
#' \item{eye}{a factor with levels \code{left} \code{right}}
#' \item{subject}{a factor with levels \code{1} \code{2} \code{3}
#' \code{4} \code{5} \code{6} \code{7}} }
#' @source Crowder, M. J. and D. J. Hand (1990). Analysis of Repeated Measures.
#' London: Chapman & Hall.
#' @keywords datasets
#' @examples
#'
#' data(vision)
#' ## maybe str(vision) ; plot(vision) ...
#'
NULL





#' resitivity of wafer in semiconductor experiment
#'
#' A full factorial experiment with four two-level predictors.
#'
#'
#' @name wafer
#' @docType data
#' @format A data frame with 16 observations on the following 5 variables.
#' \describe{ \item{x1}{a factor with levels \code{-} \code{+}} \item{x2}{a
#' factor with levels \code{-} \code{+}} \item{x3}{a factor with levels
#' \code{-} \code{+}} \item{x4}{a factor with levels \code{-} \code{+}}
#' \item{resist}{Resistivity of the wafer} }
#' @source Myers, R. and Montgomery D. (1997) A tutorial on generalized linear
#' models, Journal of Quality Technology, 29, 274-291.
#' @keywords datasets
NULL





#' Defects in a wave soldering process
#'
#' Components are attached to an electronic circuit card assembly by a
#' wave-soldering process. The soldering process involves baking and preheating
#' the circuit card and then passing it through a solder wave by conveyor.
#' Defect arise during the process. Design is \eqn{2^{7-3}} with 3 replicates.
#'
#'
#' @name wavesolder
#' @docType data
#' @format A data frame with 16 observations on the following 10 variables.
#' \describe{ \item{y1}{Number of defects in the first replicate}
#' \item{y2}{Number of defects in the second replicate} \item{y3}{Number of
#' defects in the third replicate} \item{prebake}{prebake condition - a factor
#' with levels \code{1} \code{2}} \item{flux}{flux density - a factor with
#' levels \code{1} \code{2}} \item{speed}{conveyor speed - a factor with levels
#' \code{1} \code{2}} \item{preheat}{preheat condition - a factor with levels
#' \code{1} \code{2}} \item{cooling}{cooling time - a factor with levels
#' \code{1} \code{2}} \item{agitator}{ultrasonic solder agitator - a factor
#' with levels \code{1} \code{2}} \item{temp}{solder temperature - facctor with
#' levels \code{1} \code{2}} }
#' @references M. Hamada and J. Nelder (1997) Generalized linear models for
#' quality improvement experiments, Journal of Quality Technology, 29, 292-304
#' @source L. Condra (1993) Reliability improvement with design of experiments.
#' Marcel Dekker, NY.
#' @keywords datasets
NULL





#' Wisconsin breast cancer database
#'
#' Data come from a study of breast cancer in Wisconsin. There are 681 cases of
#' potentially cancerous tumors of which 238 are actually malignant.
#' Determining whether a tumor is really malignant is traditionally determined
#' by an invasive surgical procedure. The purpose of this study was to
#' determine whether a new procedure called fine needle aspiration which draws
#' only a small sample of tissue could be effective in determining tumor
#' status.
#'
#' The predictor values are determined by a doctor observing the cells and
#' rating them on a scale from 1 (normal) to 10 (most abnormal) with respect to
#' the particular characteristic.
#'
#' @name wbca
#' @docType data
#' @format A data frame with 681 observations on the following 10 variables.
#' \describe{ \item{Class}{0 if malignant, 1 if benign}
#' \item{Adhes}{marginal adhesion} \item{BNucl}{bare nuclei}
#' \item{Chrom}{bland chromatin} \item{Epith}{epithelial cell
#' size} \item{Mitos}{mitoses} \item{NNucl}{normal nucleoli}
#' \item{Thick}{clump thickness} \item{UShap}{cell shape
#' uniformity} \item{USize}{cell size uniformity} }
#' @source Bennett, K.,P., and Mangasarian, O.L., Neural network training via
#' linear programming. In P. M. Pardalos, editor, Advances in Optimization and
#' Parallel Computing, pages 56-57. Elsevier Science, 1992
#' @keywords datasets
NULL





#' Western Collaborative Group Study
#'
#' 3154 healthy young men aged 39-59 from the San Francisco area were assessed
#' for their personality type. All were free from coronary heart disease at the
#' start of the research. Eight and a half years later change in this situation
#' was recorded.
#'
#' The WCGS began in 1960 with 3,524 male volunteers who were employed by 11
#' California companies. Subjects were 39 to 59 years old and free of heart
#' disease as determined by electrocardiogram. After the initial screening, the
#' study population dropped to 3,154 and the number of companies to 10 because
#' of various exclusions. The cohort comprised both blue- and white-collar
#' employees. At baseline the following information was collected:
#' socio-demographic including age, education, marital status, income,
#' occupation; physical and physiological including height, weight, blood
#' pressure, electrocardiogram, and corneal arcus; biochemical including
#' cholesterol and lipoprotein fractions; medical and family history and use of
#' medications; behavioral data including Type A interview, smoking, exercise,
#' and alcohol use. Later surveys added data on anthropometry, triglycerides,
#' Jenkins Activity Survey, and caffeine use. Average follow-up continued for
#' 8.5 years with repeat examinations
#'
#' @name wcgs
#' @docType data
#' @format A data frame with 3154 observations on the following 13 variables.
#' \describe{ \item{age}{age in years} \item{height}{height in
#' inches} \item{weight}{weight in pounds} \item{sdp}{systolic
#' blood pressure in mm Hg} \item{dbp}{diastolic blood pressure in mm
#' Hg} \item{chol}{Fasting serum cholesterol in mm \%}
#' \item{behave}{behavior type which is a factor with levels \code{A1}
#' \code{A2} \code{B3} \code{B4}} \item{cigs}{number of cigarettes
#' smoked per day} \item{dibep}{behavior type a factor with levels
#' \code{A} (Agressive) \code{B} (Passive)} \item{chd}{coronary heat
#' disease developed is a factor with levels \code{no} \code{yes}}
#' \item{typechd}{type of coronary heart disease is a factor with
#' levels \code{angina} \code{infdeath} \code{none} \code{silent}}
#' \item{timechd}{Time of CHD event or end of follow-up}
#' \item{arcus}{arcus senilis is a factor with levels \code{absent}
#' \code{present}} }
#' @references Coronary Heart Disease in the Western Collaborative Group Study
#' Final Follow-up Experience of 8 1/2 Years Ray H. Rosenman, MD; Richard J.
#' Brand, PhD; C. David Jenkins, PhD; Meyer Friedman, MD; Reuben Straus, MD;
#' Moses Wurm, MD JAMA. 1975;233(8):872-877.
#' doi:10.1001/jama.1975.03260080034016.
#' @source Statistics for Epidemiology by N. Jewell (2004)
#' @keywords datasets
#' @examples
#'
#' data(wcgs)
#' ## maybe str(wcgs) ; plot(wcgs) ...
#'
NULL





#' welding strength DOE
#'
#' An experiment to investigate factors affecting welding strength.
#'
#'
#' @name weldstrength
#' @docType data
#' @format A data frame with 16 observations on the following 10 variables.
#' \describe{ \item{Rods}{a 0-1 predictor} \item{Drying}{a 0-1 predictor}
#' \item{Material}{a 0-1 predictor} \item{Thickness}{a 0-1 predictor}
#' \item{Angle}{a 0-1 predictor} \item{Opening}{a 0-1 predictor}
#' \item{Current}{a 0-1 predictor} \item{Method}{a 0-1 predictor}
#' \item{Preheating}{a 0-1 predictor} \item{Strength}{The welding strength} }
#' @source G. Box and R. Meyer (1986) Dispersion effects from fractional
#' designs, Technometrics, 28, 19-27
#' @keywords datasets
NULL





#' Insect damage to wheat by variety
#'
#' Insect damage to wheat by variety
#'
#'
#' @name wheat
#' @docType data
#' @format A data frame with 13 observations on the following 2 variables.
#' \describe{ \item{damage}{a numeric vector} \item{variety}{a
#' factor with levels \code{A} \code{B} \code{C} \code{D}} }
#' @source Unknown
#' @keywords datasets
#' @examples
#'
#' data(wheat)
#' ## maybe str(wheat) ; plot(wheat) ...
#'
NULL





#' Data on players from the 2010 World Cup
#'
#' Data on players from the 2010 World Cup
#'
#' None
#'
#' @name worldcup
#' @docType data
#' @format A data frame with 595 observations on the following 7 variables.
#' \describe{ \item{Team}{Country} \item{Position}{a factor
#' with levels \code{Defender} \code{Forward} \code{Goalkeeper}
#' \code{Midfielder}} \item{Time}{Time played in minutes}
#' \item{Shots}{Number of shots attempted} \item{Passes}{Number
#' of passes made} \item{Tackles}{Number of tackles made}
#' \item{Saves}{Number of saves made} }
#' @source Lost
#' @keywords datasets
#' @examples
#'
#' data(worldcup)
#' ## maybe str(worldcup) ; plot(worldcup) ...
#'
NULL
Any scripts or data that you put into this service are public.
faraway documentation built on Aug. 23, 2022, 5:08 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
faraway
Functions and Datasets for Books by Julian Faraway

R/faraway-package.R
In faraway: Functions and Datasets for Books by Julian Faraway

Try the faraway package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

faraway Functions and Datasets for Books by Julian Faraway

R/faraway-package.R In faraway: Functions and Datasets for Books by Julian Faraway

Try the faraway package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

faraway
Functions and Datasets for Books by Julian Faraway

R/faraway-package.R
In faraway: Functions and Datasets for Books by Julian Faraway