Description Usage Format Details Source References See Also Examples
This data frame contains the ages and sexes of the adult (over 15 years) survivors and nonsurvivors of the Donner party.
1 |
A data frame with 45 observations on the following 3 variables.
Age of person
Sex of person
Whether the person survived or died
In 1846 the Donner and Reed families left Springfield, Illinois, for California by covered wagon. In July, the Donner Party, as it became known, reached Fort Bridger, Wyoming. There its leaders decided to attempt a new and untested rote to the Sacramento Valley. Having reached its full size of 87 people and 20 wagons, the party was delayed by a difficult crossing of the Wasatch Range and again in the crossing of the desert west of the Great Salt Lake. The group became stranded in the eastern Sierra Nevada mountains when the region was hit by heavy snows in late October. By the time the last survivor was rescued on April 21, 1847, 40 of the 87 members had died from famine and exposure to extreme cold.
Ramsey, F.L. and Schafer, D.W. (2013). The Statistical Sleuth: A Course in Methods of Data Analysis (3rd ed), Cengage Learning.
Grayson, D.K. (1990). Donner Party Deaths: A Demographic Assessment, Journal of Anthropological Research 46: 223–242.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | str(case2001)
attach(case2001)
## EXPLORATION AND MODEL BUILDING
myPointCode <- ifelse(Sex=="Female",22,24)
myPointColor <- ifelse(Sex=="Female","green","orange")
survivalIndicator <- ifelse(Status=="Survived",1,0)
jFactor <- 0.1 # jittering factor
plot(jitter(survivalIndicator,jFactor) ~ jitter(Age, jFactor),
pch=myPointCode, bg=myPointColor, cex=1.5)
# Logistic regression. Start with a rich model; use backward elimination
ageSquared <- Age^2
myGlm1 <- glm(Status ~ Age + ageSquared + Sex + Age:Sex + ageSquared:Sex,
family=binomial)
# Use backward elimination, but remove interaction and squared terms 1st
summary(myGlm1)
myGlm2 <- update(myGlm1, ~ . - ageSquared:Sex)
summary(myGlm2)
myGlm3 <- update(myGlm2, ~ . - ageSquared)
summary(myGlm3) # Wald test p-value for interaction of Age and Sex is: 0.0865
# More accurate likelihood ratio (drop in deviance) test:
myGlm4 <-update(myGlm3, ~ . - Age:Sex)
anova(myGlm4, myGlm3) # Drop-in-devaince chi-square stat = 3.9099 on 1 d.f.
1 - pchisq(3.9099,1) # 2-sided p-value = 0.048
## INFERENCE AND INTERPRETATION
# Proceed by ignoring interaction (for a casual and approximate analysis)
myGlm5 <- update(myGlm4, ~ . - Sex)
anova(myGlm5, myGlm4) # Drop-in-deviance chi-square statistic = 5.0344 on 1 d.f.
1 - pchisq(5.0344,1) # 2-sided p-value 0.02484869: Highly suggestive
0.0248869/2 # 1-sided p-value = half the 2-sided p-value = 0.01244345
# Interpretation and confidence interval
Sex <- factor(Sex,levels=c("Male","Female")) # Reorder levels so "Male" is ref
myGlm4b <- glm(Status ~ Age + Sex, family=binomial)
beta <- myGlm4b$coef
exp(beta[3]) # 4.939645
exp(confint(myGlm4b,3)) # 25.246069 1.215435
# Interpretation: The odds of survival for females are estimated to be 5 times
# the odds of survival of similarly-aged mean (95% CI: 1.2 times to 25.2 times).
## GRAPHICAL DISPLAY FOR PRESENTATION
myPointCode <- ifelse(Sex=="Female",22,24)
myPointColor <- ifelse(Sex=="Female","green","orange")
myLineColor <- ifelse(Sex=="Female","dark green","blue")
survivalIndicator <- ifelse(Status=="Survived",1,0)
jFactor <- 0.1
plot(jitter(survivalIndicator,jFactor) ~ jitter(Age, jFactor),
ylab="Estimated Survival Probability", xlab="Age (years)",
main=c("Donner Party Survival by Sex and Age"), xlim=c(15,75),
pch=myPointCode, bg=myPointColor, col=myLineColor, cex=2, lwd=3)
beta <- myGlm4b$coef
dummyAge <- seq(15,65,length=50)
linearMale <- beta[1] + beta[2]*dummyAge #log odds of survival for males
linearFemale <- linearMale + beta[3] #log odds of survival for females
pCurveMale <- exp(linearMale)/(1 + exp(linearMale) ) # survival prob; males
pCurveFemale <- exp(linearFemale)/(1 + exp(linearFemale)) # females
lines(pCurveMale ~ dummyAge,lty=2,col="blue",lwd=3)
lines(pCurveFemale[dummyAge <= 50] ~ dummyAge[dummyAge <= 50],lty=1,
col="dark green",lwd=3)
legend(63,.5,legend=c("Females","Males"), pch=c(22,24),
pt.bg = c("green","orange"), pt.cex=c(2,2), lty=c(1,2), lwd=c(3,3),
col=c("dark green","blue"))
text(72,1,"Survived (20)")
text(72,0,"Died (25)")
detach(case2001)
|
'data.frame': 45 obs. of 3 variables:
$ Age : int 23 40 40 30 28 40 45 62 65 45 ...
$ Sex : Factor w/ 2 levels "Female","Male": 2 1 2 2 2 2 1 2 2 1 ...
$ Status: Factor w/ 2 levels "Died","Survived": 1 2 2 1 1 1 1 1 1 1 ...
Call:
glm(formula = Status ~ Age + ageSquared + Sex + Age:Sex + ageSquared:Sex,
family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.3396 -0.9757 -0.3438 0.5269 1.5901
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.053198 9.684350 -0.315 0.753
Age 0.482908 0.658121 0.734 0.463
ageSquared -0.010160 0.010263 -0.990 0.322
SexMale -0.265286 10.455222 -0.025 0.980
Age:SexMale -0.299877 0.696050 -0.431 0.667
ageSquared:SexMale 0.007356 0.010689 0.688 0.491
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 61.827 on 44 degrees of freedom
Residual deviance: 45.361 on 39 degrees of freedom
AIC: 57.361
Number of Fisher Scoring iterations: 5
Call:
glm(formula = Status ~ Age + ageSquared + Sex + Age:Sex, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2317 -0.9748 -0.3138 0.6874 1.6492
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 3.546759 4.432739 0.800 0.4236
Age 0.039212 0.223296 0.176 0.8606
ageSquared -0.003398 0.003047 -1.115 0.2648
SexMale -7.594162 3.403983 -2.231 0.0257 *
Age:SexMale 0.187732 0.098863 1.899 0.0576 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 61.827 on 44 degrees of freedom
Residual deviance: 45.830 on 40 degrees of freedom
AIC: 55.83
Number of Fisher Scoring iterations: 5
Call:
glm(formula = Status ~ Age + Sex + Age:Sex, family = binomial)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.2279 -0.9388 -0.5550 0.7794 1.6998
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 7.24638 3.20517 2.261 0.0238 *
Age -0.19407 0.08742 -2.220 0.0264 *
SexMale -6.92805 3.39887 -2.038 0.0415 *
Age:SexMale 0.16160 0.09426 1.714 0.0865 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 61.827 on 44 degrees of freedom
Residual deviance: 47.346 on 41 degrees of freedom
AIC: 55.346
Number of Fisher Scoring iterations: 5
Analysis of Deviance Table
Model 1: Status ~ Age + Sex
Model 2: Status ~ Age + Sex + Age:Sex
Resid. Df Resid. Dev Df Deviance
1 42 51.256
2 41 47.346 1 3.9099
[1] 0.04800245
Analysis of Deviance Table
Model 1: Status ~ Age
Model 2: Status ~ Age + Sex
Resid. Df Resid. Dev Df Deviance
1 43 56.291
2 42 51.256 1 5.0344
[1] 0.02484869
[1] 0.01244345
SexFemale
4.939645
Waiting for profiling to be done...
2.5 % 97.5 %
1.215435 25.246069
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.