Description Usage Format Source References Examples
A 1972–1981 health survey in The Hague, Netherlands, discovered an association between keeping pet birds and increased risk of lung cancer. To investigate birdkeeping as a risk factor, researchers conducted a case–control study of patients in 1985 at four hospitals in The Hague (population 450,000). They identified 49 cases of lung cancer among the patients who were registered with a general practice, who were age 65 or younger and who had resided in the city since 1965. They also selected 98 controls from a population of residents having the same general age structure.
1 |
A data frame with 147 observations on the following 7 variables.
Whether subject has lung cancer
Sex of subject
Socioeconomic status, determined by occupation of the household's principal wage earner
Indicator for birdkeeping (caged birds in the home for more that 6 consecutive months from 5 to 14 years before diagnosis (cases) or examination (control))
Age of subject (in years)
Years of smoking prior to diagnosis or examination
Average rate of smoking (in cigarettes per day)
Ramsey, F.L. and Schafer, D.W. (2013). The Statistical Sleuth: A Course in Methods of Data Analysis (3rd ed), Cengage Learning.
Holst, P.A., Kromhout, D. and Brand, R. (1988). For Debate: Pet Birds as an Independent Risk Factor for Lung Cancer, British Medical Journal 297: 13–21.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | str(case2002)
attach(case2002)
## EXPLORATION AND MODEL BUILDING
myCode <- ifelse(BK=="Bird" & LC=="LungCancer","Bird & Cancer",
ifelse(BK=="Bird" & LC=="NoCancer","Bird & No Cancer",
ifelse(BK=="NoBird" & LC=="LungCancer","No Bird & Cancer", "No Bird & No Cancer")))
table(myCode)
if(require(car)){ # Use the car library
scatterplotMatrix(cbind(AG,YR,CD), groups=myCode, diagonal="none",reg.line=FALSE,
pch=c(15,21,15,21), col=c("dark green","dark green","purple","purple"),
var.labels=c("Age","Years Smoked","Cigarettes per Day"), cex=1.5)
}
# Reorder the levels so that the model is for log odds of cancer
LC <- factor(LC, levels=c("NoCancer","LungCancer"))
myGlm <- glm(LC ~ FM + SS + AG + YR + CD + BK, family=binomial)
if(require(car)){ # Use the car library
crPlots(myGlm) }
# It appears that there's an effect of Years of Smoking and of Bird Keeping
# after accounting for other variables; no obvious effects of other variables
# Logistic regression model building using backward elimination (witholding BK)
myGlm1 <- glm(LC ~ FM + SS + AG + YR + CD, family=binomial)
summary(myGlm1)
myGlm2 <- update(myGlm1, ~ . - SS)
summary(myGlm2)
myGlm3 <- update(myGlm2, ~ . - CD)
summary(myGlm3)
myGlm4 <- update(myGlm3, ~ . - FM)
summary(myGlm4) # Everything left has a small p-value (retain the intercept)
## INFERENCE AND INTERPRETATION
myGlm5 <- update(myGlm4, ~ . + BK) # Now add bird keeping
summary(myGlm5)
myGlm6 <- update(myGlm5, ~ . + BK:YR + AG:YR) # Try interaction terms
anova(myGlm6,myGlm5) # Drop-in-deviance = 1.61 on 2 d.f.
1 - pchisq(1.61,2) # p-value = .45: no evidence of interaction
anova(myGlm4,myGlm5) # Test for bird keeping effect
(1 - pchisq(12.612,1))/2 # 1-sided p-value: 0.0001916391
BK <- factor(BK, levels=c("NoBird", "Bird")) # Make "no bird" the ref level
myGlm5b <- glm(LC ~ AG + YR + BK, family=binomial)
beta <- myGlm5b$coef # Extract estimated coefficients
exp(beta[4]) # 3.961248
exp(confint(myGlm5b,4)) # 1.836764 8.900840
# Interpretation: The odds of lung cancer for people who kept birds were
# estimated to be 4 times the odds of lung cancer for people of similar age, sex,
# smoking history, and socio-economic status who didn't keep birds
# (95% confidence interval for this adjusted odds ratio: 1.8 times to 8.9 times).
# See bestglm library for an alternative variable selection technique.
detach(case2002)
|
'data.frame': 147 obs. of 7 variables:
$ LC: Factor w/ 2 levels "LungCancer","NoCancer": 1 1 1 1 1 1 1 1 1 1 ...
$ FM: Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
$ SS: Factor w/ 2 levels "High","Low": 2 2 1 2 2 1 1 2 2 1 ...
$ BK: Factor w/ 2 levels "Bird","NoBird": 1 1 2 1 1 2 1 2 1 2 ...
$ AG: int 37 41 43 46 49 51 52 53 56 56 ...
$ YR: int 19 22 19 24 31 24 31 33 33 26 ...
$ CD: int 12 15 15 15 20 15 20 20 10 25 ...
myCode
Bird & Cancer Bird & No Cancer No Bird & Cancer No Bird & No Cancer
33 34 16 64
Loading required package: car
Call:
glm(formula = LC ~ FM + SS + AG + YR + CD, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3910 -0.9718 -0.5519 1.1733 2.5020
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.37895 1.67206 0.227 0.82070
FMMale -0.74923 0.50501 -1.484 0.13792
SSLow 0.07303 0.43893 0.166 0.86785
AG -0.05799 0.03432 -1.690 0.09112 .
YR 0.07955 0.02636 3.018 0.00255 **
CD 0.01978 0.02422 0.817 0.41421
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 187.14 on 146 degrees of freedom
Residual deviance: 165.87 on 141 degrees of freedom
AIC: 177.87
Number of Fisher Scoring iterations: 5
Call:
glm(formula = LC ~ FM + AG + YR + CD, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.4134 -0.9744 -0.5430 1.1749 2.5123
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.46101 1.59688 0.289 0.77282
FMMale -0.76832 0.49178 -1.562 0.11821
AG -0.05858 0.03415 -1.715 0.08628 .
YR 0.08027 0.02603 3.083 0.00205 **
CD 0.01959 0.02420 0.810 0.41820
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 187.14 on 146 degrees of freedom
Residual deviance: 165.90 on 142 degrees of freedom
AIC: 175.9
Number of Fisher Scoring iterations: 5
Call:
glm(formula = LC ~ FM + AG + YR, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.2597 -0.9794 -0.5462 1.1718 2.4894
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.82886 1.52662 0.543 0.587172
FMMale -0.73638 0.48914 -1.505 0.132210
AG -0.06363 0.03359 -1.894 0.058195 .
YR 0.08776 0.02452 3.579 0.000344 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 187.14 on 146 degrees of freedom
Residual deviance: 166.55 on 143 degrees of freedom
AIC: 174.55
Number of Fisher Scoring iterations: 5
Call:
glm(formula = LC ~ AG + YR, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.2933 -0.9869 -0.5682 1.2448 2.5943
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.67653 1.49597 0.452 0.651100
AG -0.06568 0.03291 -1.996 0.045976 *
YR 0.07815 0.02321 3.368 0.000758 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 187.14 on 146 degrees of freedom
Residual deviance: 168.83 on 144 degrees of freedom
AIC: 174.83
Number of Fisher Scoring iterations: 5
Call:
glm(formula = LC ~ AG + YR + BK, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5466 -0.8649 -0.4911 0.9763 2.2584
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.34296 1.58002 0.217 0.828159
AG -0.04610 0.03430 -1.344 0.178952
YR 0.07485 0.02296 3.261 0.001111 **
BKNoBird -1.37656 0.40073 -3.435 0.000592 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 187.14 on 146 degrees of freedom
Residual deviance: 156.22 on 143 degrees of freedom
AIC: 164.22
Number of Fisher Scoring iterations: 5
Analysis of Deviance Table
Model 1: LC ~ AG + YR + BK + YR:BK + AG:YR
Model 2: LC ~ AG + YR + BK
Resid. Df Resid. Dev Df Deviance
1 141 154.60
2 143 156.22 -2 -1.6163
[1] 0.4470879
Analysis of Deviance Table
Model 1: LC ~ AG + YR
Model 2: LC ~ AG + YR + BK
Resid. Df Resid. Dev Df Deviance
1 144 168.83
2 143 156.22 1 12.612
[1] 0.0001916391
BKBird
3.961248
Waiting for profiling to be done...
2.5 % 97.5 %
1.836764 8.900840
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.