case2002: Birdkeeping and Lung Cancer

Description Usage Format Source References Examples

Description

A 1972–1981 health survey in The Hague, Netherlands, discovered an association between keeping pet birds and increased risk of lung cancer. To investigate birdkeeping as a risk factor, researchers conducted a case–control study of patients in 1985 at four hospitals in The Hague (population 450,000). They identified 49 cases of lung cancer among the patients who were registered with a general practice, who were age 65 or younger and who had resided in the city since 1965. They also selected 98 controls from a population of residents having the same general age structure.

Usage

1

Format

A data frame with 147 observations on the following 7 variables.

LC

Whether subject has lung cancer

FM

Sex of subject

SS

Socioeconomic status, determined by occupation of the household's principal wage earner

BK

Indicator for birdkeeping (caged birds in the home for more that 6 consecutive months from 5 to 14 years before diagnosis (cases) or examination (control))

AG

Age of subject (in years)

YR

Years of smoking prior to diagnosis or examination

CD

Average rate of smoking (in cigarettes per day)

Source

Ramsey, F.L. and Schafer, D.W. (2013). The Statistical Sleuth: A Course in Methods of Data Analysis (3rd ed), Cengage Learning.

References

Holst, P.A., Kromhout, D. and Brand, R. (1988). For Debate: Pet Birds as an Independent Risk Factor for Lung Cancer, British Medical Journal 297: 13–21.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
str(case2002)
attach(case2002)
   
## EXPLORATION AND MODEL BUILDING
myCode <- ifelse(BK=="Bird" & LC=="LungCancer","Bird & Cancer", 
  ifelse(BK=="Bird" & LC=="NoCancer","Bird & No Cancer",
  ifelse(BK=="NoBird" & LC=="LungCancer","No Bird & Cancer", "No Bird & No Cancer")))
table(myCode)
if(require(car)){   # Use the car library
scatterplotMatrix(cbind(AG,YR,CD), groups=myCode, diagonal="none",reg.line=FALSE,
  pch=c(15,21,15,21), col=c("dark green","dark green","purple","purple"),
  var.labels=c("Age","Years Smoked","Cigarettes per Day"), cex=1.5) 
}

# Reorder the levels so that the model is for log odds of cancer
LC    <- factor(LC, levels=c("NoCancer","LungCancer"))    
myGlm <- glm(LC ~ FM + SS + AG + YR + CD + BK, family=binomial)
if(require(car)){   # Use the car library
  crPlots(myGlm)  }
# It appears that there's an effect of Years of Smoking and of Bird Keeping
# after accounting for other variables; no obvious effects of other variables

# Logistic regression model building using backward elimination (witholding BK)
myGlm1 <- glm(LC ~ FM + SS + AG + YR + CD, family=binomial)
summary(myGlm1)
myGlm2 <- update(myGlm1, ~ . - SS)        
summary(myGlm2)
myGlm3 <- update(myGlm2, ~ . - CD)   
summary(myGlm3)
myGlm4 <- update(myGlm3, ~ . - FM)   
summary(myGlm4) # Everything left has a small p-value (retain the intercept)


## INFERENCE AND INTERPRETATION
myGlm5 <- update(myGlm4, ~ . + BK)    # Now add bird keeping
summary(myGlm5)
myGlm6 <- update(myGlm5, ~ . + BK:YR + AG:YR) # Try interaction terms
anova(myGlm6,myGlm5) # Drop-in-deviance = 1.61 on 2 d.f.
1 - pchisq(1.61,2)    # p-value = .45: no evidence of interaction
anova(myGlm4,myGlm5)   # Test for bird keeping effect
(1 - pchisq(12.612,1))/2  # 1-sided p-value: 0.0001916391
 
BK <- factor(BK, levels=c("NoBird", "Bird"))  # Make "no bird" the ref level
myGlm5b <- glm(LC ~ AG + YR + BK, family=binomial)   
beta <- myGlm5b$coef  # Extract estimated coefficients
exp(beta[4])   # 3.961248                
exp(confint(myGlm5b,4))   # 1.836764 8.900840  
# Interpretation: The odds of lung cancer for people who kept birds were 
# estimated to be 4 times the odds of lung cancer for people of similar age, sex, 
# smoking history, and socio-economic status who didn't keep birds
# (95% confidence interval for this adjusted odds ratio: 1.8 times to 8.9 times).

# See bestglm library for an alternative variable selection technique. 
 
detach(case2002)

Example output

'data.frame':	147 obs. of  7 variables:
 $ LC: Factor w/ 2 levels "LungCancer","NoCancer": 1 1 1 1 1 1 1 1 1 1 ...
 $ FM: Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
 $ SS: Factor w/ 2 levels "High","Low": 2 2 1 2 2 1 1 2 2 1 ...
 $ BK: Factor w/ 2 levels "Bird","NoBird": 1 1 2 1 1 2 1 2 1 2 ...
 $ AG: int  37 41 43 46 49 51 52 53 56 56 ...
 $ YR: int  19 22 19 24 31 24 31 33 33 26 ...
 $ CD: int  12 15 15 15 20 15 20 20 10 25 ...
myCode
      Bird & Cancer    Bird & No Cancer    No Bird & Cancer No Bird & No Cancer 
                 33                  34                  16                  64 
Loading required package: car

Call:
glm(formula = LC ~ FM + SS + AG + YR + CD, family = binomial)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.3910  -0.9718  -0.5519   1.1733   2.5020  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  0.37895    1.67206   0.227  0.82070   
FMMale      -0.74923    0.50501  -1.484  0.13792   
SSLow        0.07303    0.43893   0.166  0.86785   
AG          -0.05799    0.03432  -1.690  0.09112 . 
YR           0.07955    0.02636   3.018  0.00255 **
CD           0.01978    0.02422   0.817  0.41421   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 187.14  on 146  degrees of freedom
Residual deviance: 165.87  on 141  degrees of freedom
AIC: 177.87

Number of Fisher Scoring iterations: 5


Call:
glm(formula = LC ~ FM + AG + YR + CD, family = binomial)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.4134  -0.9744  -0.5430   1.1749   2.5123  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)   
(Intercept)  0.46101    1.59688   0.289  0.77282   
FMMale      -0.76832    0.49178  -1.562  0.11821   
AG          -0.05858    0.03415  -1.715  0.08628 . 
YR           0.08027    0.02603   3.083  0.00205 **
CD           0.01959    0.02420   0.810  0.41820   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 187.14  on 146  degrees of freedom
Residual deviance: 165.90  on 142  degrees of freedom
AIC: 175.9

Number of Fisher Scoring iterations: 5


Call:
glm(formula = LC ~ FM + AG + YR, family = binomial)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.2597  -0.9794  -0.5462   1.1718   2.4894  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.82886    1.52662   0.543 0.587172    
FMMale      -0.73638    0.48914  -1.505 0.132210    
AG          -0.06363    0.03359  -1.894 0.058195 .  
YR           0.08776    0.02452   3.579 0.000344 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 187.14  on 146  degrees of freedom
Residual deviance: 166.55  on 143  degrees of freedom
AIC: 174.55

Number of Fisher Scoring iterations: 5


Call:
glm(formula = LC ~ AG + YR, family = binomial)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.2933  -0.9869  -0.5682   1.2448   2.5943  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.67653    1.49597   0.452 0.651100    
AG          -0.06568    0.03291  -1.996 0.045976 *  
YR           0.07815    0.02321   3.368 0.000758 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 187.14  on 146  degrees of freedom
Residual deviance: 168.83  on 144  degrees of freedom
AIC: 174.83

Number of Fisher Scoring iterations: 5


Call:
glm(formula = LC ~ AG + YR + BK, family = binomial)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.5466  -0.8649  -0.4911   0.9763   2.2584  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)  0.34296    1.58002   0.217 0.828159    
AG          -0.04610    0.03430  -1.344 0.178952    
YR           0.07485    0.02296   3.261 0.001111 ** 
BKNoBird    -1.37656    0.40073  -3.435 0.000592 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 187.14  on 146  degrees of freedom
Residual deviance: 156.22  on 143  degrees of freedom
AIC: 164.22

Number of Fisher Scoring iterations: 5

Analysis of Deviance Table

Model 1: LC ~ AG + YR + BK + YR:BK + AG:YR
Model 2: LC ~ AG + YR + BK
  Resid. Df Resid. Dev Df Deviance
1       141     154.60            
2       143     156.22 -2  -1.6163
[1] 0.4470879
Analysis of Deviance Table

Model 1: LC ~ AG + YR
Model 2: LC ~ AG + YR + BK
  Resid. Df Resid. Dev Df Deviance
1       144     168.83            
2       143     156.22  1   12.612
[1] 0.0001916391
  BKBird 
3.961248 
Waiting for profiling to be done...
   2.5 %   97.5 % 
1.836764 8.900840 

Sleuth3 documentation built on May 31, 2017, 1:56 a.m.