AirPollution: Air pollution and mortality

Description Usage Format Source References Examples

Description

Data relating air pollution and mortality, frequently used for illustrations in ridge regression and related tasks.

Usage

1

Format

A data frame containing 60 observations on 16 variables.

precipitation

average annual precipitation in inches

temperature1

average January temperature in degrees Fahrenheit

temperature7

average July temperature in degrees Fahrenheit

age

percentage of 1960 SMSA population aged 65 or older

household

average household size

education

median school years completed by those over 22

housing

percentage of housing units which are sound and with all facilities

population

population per square mile in urbanized areas, 1960

noncauc

percentage of non-Caucasian population in urbanized areas, 1960

whitecollar

percentage employed in white collar occupations

income

percentage of families with income < USD 3000

hydrocarbon

relative hydrocarbon pollution potential

nox

relative nitric oxides potential

so2

relative sulphur dioxide potential

humidity

annual average percentage of relative humidity at 13:00

mortality

total age-adjusted mortality rate per 100,000

Source

http://lib.stat.cmu.edu/datasets/pollution

References

McDonald GC, Schwing RC (1973). Instabilities of regression estimates relating air pollution to mortality. Technometrics, 15, 463–482.

Miller AJ (2002). Subset selection in regression. New York: Chapman and Hall.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## load data (with logs for relative potentials)
data("AirPollution", package = "lmSubsets")
for (i in 12:14)  AirPollution[[i]] <- log(AirPollution[[i]])

## fit subsets
lm_all <- lmSubsets(mortality ~ ., data = AirPollution)
plot(lm_all)

## refit best model
lm6 <- refit(lm_all, size = 6)
summary(lm6)

Example output

Call:
lm(formula = mortality ~ precipitation + temperature1 + education + 
    noncauc + nox, data = AirPollution)

Residuals:
    Min      1Q  Median      3Q     Max 
-92.801 -20.280  -0.059  21.390  75.403 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)   988.3250    81.9579  12.059  < 2e-16 ***
precipitation   2.2616     0.6236   3.626 0.000637 ***
temperature1   -2.0340     0.4873  -4.174 0.000110 ***
education     -13.9327     6.0967  -2.285 0.026248 *  
noncauc         3.7287     0.6380   5.845 3.02e-07 ***
nox            19.4865     4.3556   4.474 4.00e-05 ***
---
Signif. codes:  0***0.001**0.01*0.05.’ 0.1 ‘ ’ 1

Residual standard error: 32.88 on 54 degrees of freedom
Multiple R-squared:  0.7442,	Adjusted R-squared:  0.7206 
F-statistic: 31.43 on 5 and 54 DF,  p-value: 7.579e-15

lmSubsets documentation built on Feb. 8, 2021, 1:06 a.m.