case2001: Survival in the Donner Party

Description Usage Format Details Source References See Also Examples

Description

This data frame contains the ages and sexes of the adult (over 15 years) survivors and nonsurvivors of the Donner party.

Usage

1

Format

A data frame with 45 observations on the following 3 variables.

Age

Age of person

Sex

Sex of person

Status

Whether the person survived or died

Details

In 1846 the Donner and Reed families left Springfield, Illinois, for California by covered wagon. In July, the Donner Party, as it became known, reached Fort Bridger, Wyoming. There its leaders decided to attempt a new and untested rote to the Sacramento Valley. Having reached its full size of 87 people and 20 wagons, the party was delayed by a difficult crossing of the Wasatch Range and again in the crossing of the desert west of the Great Salt Lake. The group became stranded in the eastern Sierra Nevada mountains when the region was hit by heavy snows in late October. By the time the last survivor was rescued on April 21, 1847, 40 of the 87 members had died from famine and exposure to extreme cold.

Source

Ramsey, F.L. and Schafer, D.W. (2013). The Statistical Sleuth: A Course in Methods of Data Analysis (3rd ed), Cengage Learning.

References

Grayson, D.K. (1990). Donner Party Deaths: A Demographic Assessment, Journal of Anthropological Research 46: 223–242.

See Also

ex1918

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
str(case2001)
attach(case2001)
      
## EXPLORATION AND MODEL BUILDING
myPointCode   <- ifelse(Sex=="Female",22,24)
myPointColor  <- ifelse(Sex=="Female","green","orange")
survivalIndicator <- ifelse(Status=="Survived",1,0)
jFactor <- 0.1   # jittering factor
plot(jitter(survivalIndicator,jFactor) ~ jitter(Age, jFactor),
  pch=myPointCode, bg=myPointColor, cex=1.5)
# Logistic regression. Start with a rich model; use backward elimination
ageSquared <- Age^2
myGlm1 <- glm(Status ~ Age + ageSquared + Sex + Age:Sex + ageSquared:Sex, 
  family=binomial)
# Use backward elimination, but remove interaction and squared terms 1st 
summary(myGlm1)
myGlm2 <- update(myGlm1, ~ . - ageSquared:Sex)
summary(myGlm2)
myGlm3 <- update(myGlm2, ~ . - ageSquared)
summary(myGlm3)   # Wald test p-value for interaction of Age and Sex is: 0.0865
# More accurate likelihood ratio (drop in deviance) test:
myGlm4 <-update(myGlm3, ~ . - Age:Sex)
anova(myGlm4, myGlm3)  # Drop-in-devaince chi-square stat = 3.9099 on 1 d.f.
 1 - pchisq(3.9099,1)  # 2-sided p-value = 0.048
 
## INFERENCE AND INTERPRETATION
# Proceed by ignoring interaction (for a casual and approximate analysis) 
myGlm5  <- update(myGlm4, ~ . - Sex)
anova(myGlm5, myGlm4) # Drop-in-deviance chi-square statistic = 5.0344 on 1 d.f.
1 - pchisq(5.0344,1)  # 2-sided p-value 0.02484869: Highly suggestive  
0.0248869/2  # 1-sided p-value = half the 2-sided p-value = 0.01244345
# Interpretation and confidence interval
Sex <- factor(Sex,levels=c("Male","Female")) # Reorder levels so "Male" is ref 
myGlm4b <- glm(Status ~ Age + Sex, family=binomial)
beta <- myGlm4b$coef
exp(beta[3]) # 4.939645 
exp(confint(myGlm4b,3))  # 25.246069  1.215435 
# Interpretation: The odds of survival for females are estimated to be 5 times 
# the odds of survival of similarly-aged mean (95% CI: 1.2 times to 25.2 times).


## GRAPHICAL DISPLAY FOR PRESENTATION 
myPointCode <- ifelse(Sex=="Female",22,24)
myPointColor <- ifelse(Sex=="Female","green","orange")
myLineColor <- ifelse(Sex=="Female","dark green","blue")
survivalIndicator <- ifelse(Status=="Survived",1,0)
jFactor <- 0.1
plot(jitter(survivalIndicator,jFactor) ~ jitter(Age, jFactor),
  ylab="Estimated Survival Probability", xlab="Age (years)",
  main=c("Donner Party Survival by Sex and Age"), xlim=c(15,75),   
  pch=myPointCode, bg=myPointColor, col=myLineColor, cex=2, lwd=3)
beta            <- myGlm4b$coef
dummyAge        <- seq(15,65,length=50)
linearMale      <- beta[1] + beta[2]*dummyAge  #log odds of survival for males
linearFemale    <- linearMale + beta[3] #log odds of survival for females
pCurveMale      <- exp(linearMale)/(1 + exp(linearMale) ) # survival prob; males
pCurveFemale    <- exp(linearFemale)/(1 + exp(linearFemale)) # females
lines(pCurveMale ~ dummyAge,lty=2,col="blue",lwd=3)
lines(pCurveFemale[dummyAge <= 50] ~ dummyAge[dummyAge <= 50],lty=1,
  col="dark green",lwd=3)

legend(63,.5,legend=c("Females","Males"), pch=c(22,24),
  pt.bg = c("green","orange"), pt.cex=c(2,2), lty=c(1,2), lwd=c(3,3),
  col=c("dark green","blue"))
text(72,1,"Survived (20)")
text(72,0,"Died (25)")

detach(case2001)

Example output

'data.frame':	45 obs. of  3 variables:
 $ Age   : int  23 40 40 30 28 40 45 62 65 45 ...
 $ Sex   : Factor w/ 2 levels "Female","Male": 2 1 2 2 2 2 1 2 2 1 ...
 $ Status: Factor w/ 2 levels "Died","Survived": 1 2 2 1 1 1 1 1 1 1 ...

Call:
glm(formula = Status ~ Age + ageSquared + Sex + Age:Sex + ageSquared:Sex, 
    family = binomial)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.3396  -0.9757  -0.3438   0.5269   1.5901  

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)
(Intercept)        -3.053198   9.684350  -0.315    0.753
Age                 0.482908   0.658121   0.734    0.463
ageSquared         -0.010160   0.010263  -0.990    0.322
SexMale            -0.265286  10.455222  -0.025    0.980
Age:SexMale        -0.299877   0.696050  -0.431    0.667
ageSquared:SexMale  0.007356   0.010689   0.688    0.491

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 61.827  on 44  degrees of freedom
Residual deviance: 45.361  on 39  degrees of freedom
AIC: 57.361

Number of Fisher Scoring iterations: 5


Call:
glm(formula = Status ~ Age + ageSquared + Sex + Age:Sex, family = binomial)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2317  -0.9748  -0.3138   0.6874   1.6492  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)  
(Intercept)  3.546759   4.432739   0.800   0.4236  
Age          0.039212   0.223296   0.176   0.8606  
ageSquared  -0.003398   0.003047  -1.115   0.2648  
SexMale     -7.594162   3.403983  -2.231   0.0257 *
Age:SexMale  0.187732   0.098863   1.899   0.0576 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 61.827  on 44  degrees of freedom
Residual deviance: 45.830  on 40  degrees of freedom
AIC: 55.83

Number of Fisher Scoring iterations: 5


Call:
glm(formula = Status ~ Age + Sex + Age:Sex, family = binomial)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2279  -0.9388  -0.5550   0.7794   1.6998  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)  7.24638    3.20517   2.261   0.0238 *
Age         -0.19407    0.08742  -2.220   0.0264 *
SexMale     -6.92805    3.39887  -2.038   0.0415 *
Age:SexMale  0.16160    0.09426   1.714   0.0865 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 61.827  on 44  degrees of freedom
Residual deviance: 47.346  on 41  degrees of freedom
AIC: 55.346

Number of Fisher Scoring iterations: 5

Analysis of Deviance Table

Model 1: Status ~ Age + Sex
Model 2: Status ~ Age + Sex + Age:Sex
  Resid. Df Resid. Dev Df Deviance
1        42     51.256            
2        41     47.346  1   3.9099
[1] 0.04800245
Analysis of Deviance Table

Model 1: Status ~ Age
Model 2: Status ~ Age + Sex
  Resid. Df Resid. Dev Df Deviance
1        43     56.291            
2        42     51.256  1   5.0344
[1] 0.02484869
[1] 0.01244345
SexFemale 
 4.939645 
Waiting for profiling to be done...
    2.5 %    97.5 % 
 1.215435 25.246069 

Sleuth3 documentation built on May 2, 2019, 6:41 a.m.