case1202: Sex discrimination in Employment

Description Usage Format Source References See Also Examples

Description

Data on employees from one job category (skilled, entry–level clerical) of a bank that was sued for sex discrimination. The data are on 32 male and 61 female employees, hired between 1965 and 1975.

Usage

1

Format

A data frame with 93 observations on the following 7 variables.

Bsal

Annual salary at time of hire

Sal77

Salary as of March 1975

Sex

Sex of employee

Senior

Seniority (months since first hired)

Age

Age of employee (in months)

Educ

Education (in years)

Exper

Work experience prior to employment with the bank (months)

Source

Ramsey, F.L. and Schafer, D.W. (2013). The Statistical Sleuth: A Course in Methods of Data Analysis (3rd ed), Cengage Learning.

References

Roberts, H.V. (1979). Harris Trust and Savings Bank: An Analysis of Employee Compensation, Report 7946, Center for Mathematical Studies in Business and Economics, University of Chicago Graduate School of Business.

See Also

case0102

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
str(case1202)
attach(case1202)

## EXPLORATION
logSal <- log(Bsal)    
myMatrix <- cbind (logSal, Senior,Age, Educ, Exper)   
if(require(car)){   # Use the car library
  scatterplotMatrix(myMatrix, smooth=FALSE, diagonal="histogram",
                    groups=Sex, col=c("red","blue") )   
}                                
myLm1 <- lm(logSal ~ Senior + Age + Educ + Exper + Sex)
plot(myLm1, which=1)           
plot(myLm1, which=4) #  Cook's Distance 
if(require(car)){   # Use the car library
  crPlots(myLm1)    # Partial residual plots
}             
ageSquared    <- Age^2   
ageCubed      <- Age^3     
experSquared  <- Exper^2
experCubed    <- Exper^3
myLm2 <- lm(logSal ~ Senior + Age + ageSquared  + ageCubed + 
  Educ + Exper + experSquared + experCubed  + Sex)
plot(myLm2, which=1)  # Residual plot         
plot(myLm1, which=4)  # Cook's distance         

if(require(leaps)){   # Use the leaps library
  mySubsets     <- regsubsets(logSal ~ (Senior + Age + Educ + Exper + 
    ageSquared  + experSquared)^2, nvmax=25, data=case1202)    
  mySummary  <- summary(mySubsets)    
  p  <- apply(mySummary$which, 1, sum)     
  plot(mySummary$bic ~ p, ylab = "BIC")            
  cbind(p,mySummary$bic)  
  mySummary$which[8,]  # Note that Age:ageSquared = ageCubed
}
myLm3         <- lm(logSal ~ Age + Educ + ageSquared + Senior:Educ + 
  Age:Exper + ageCubed + Educ:Exper + Exper:ageSquared) 
summary(myLm3)

myLm4 <- update(myLm3, ~ . + Sex)  
summary(myLm4)
myLm5 <- update(myLm4, ~ . + Sex:Age + Sex:Educ + Sex:Senior + 
  Sex:Exper + Sex:ageSquared)
anova(myLm4, myLm5) 

## INFERENCE AND INTERPRETATION
summary(myLm4)
beta          <- myLm4$coef  
exp(beta[6])             
exp(confint(myLm4,6))    
# Conclusion:  The median beginning salary for males was estimated to be 12% 
# higher than the median salary for females with similar values of the available 
# qualification variables (95% confidence interval: 7% to 17% higher).

## DISPLAY FOR PRESENTATION        
years <- Exper/12  # Change months to years
plot(Bsal ~ years, log="y", xlab="Previous Work Experience (Years)",
  ylab="Beginning Salary (Dollars); Log Scale",
  main="Beginning Salaries and Experience for 61 Female and 32 Male Employees",
  pch= ifelse(Sex=="Male",24,21), bg = "gray", 
  col= ifelse(Sex=="Male","blue","red"), lwd=2, cex=1.8 )
myLm6 <- lm(logSal ~ Exper + experSquared + experCubed + Sex)
beta <- myLm6$coef
dummyExper <- seq(min(Exper),max(Exper),length=50)
curveF <- beta[1] + beta[2]*dummyExper + beta[3]*dummyExper^2 +
  beta[4]*dummyExper^3 
curveM <- curveF + beta[5]
dummyYears <- dummyExper/12
lines(exp(curveF) ~ dummyYears, lty=1, lwd=2,col="red")
lines(exp(curveM) ~ dummyYears, lty = 2, lwd=2, col="blue")
legend(28,8150, c("Male","Female"),pch=c(24,21), pt.cex=1.8, pt.lwd=2, 
  pt.bg=c("gray","gray"), col=c("blue","red"), lty=c(2,1), lwd=2) 

detach(case1202)

Example output

'data.frame':	93 obs. of  7 variables:
 $ Bsal  : int  5040 6300 6000 6000 6000 6840 8100 6000 6000 6900 ...
 $ Sal77 : int  12420 12060 15120 16320 12300 10380 13980 10140 12360 10920 ...
 $ Sex   : Factor w/ 2 levels "Female","Male": 2 2 2 2 2 2 2 2 2 2 ...
 $ Senior: int  96 82 67 97 66 92 66 82 88 75 ...
 $ Age   : int  329 357 315 354 351 374 369 363 555 416 ...
 $ Educ  : int  15 15 15 12 12 15 16 12 12 15 ...
 $ Exper : num  14 72 35.5 24 56 41.5 54.5 32 252 132 ...
Loading required package: car
Loading required package: carData
Warning message:
In applyDefaults(diagonal, defaults = list(method = "adaptiveDensity"),  :
  unnamed diag arguments, will be ignored
Loading required package: leaps
            (Intercept)                  Senior                     Age 
                   TRUE                   FALSE                    TRUE 
                   Educ                   Exper              ageSquared 
                   TRUE                   FALSE                    TRUE 
           experSquared              Senior:Age             Senior:Educ 
                  FALSE                   FALSE                    TRUE 
           Senior:Exper       Senior:ageSquared     Senior:experSquared 
                  FALSE                   FALSE                   FALSE 
               Age:Educ               Age:Exper          Age:ageSquared 
                  FALSE                    TRUE                    TRUE 
       Age:experSquared              Educ:Exper         Educ:ageSquared 
                  FALSE                    TRUE                   FALSE 
      Educ:experSquared        Exper:ageSquared      Exper:experSquared 
                  FALSE                    TRUE                   FALSE 
ageSquared:experSquared 
                  FALSE 

Call:
lm(formula = logSal ~ Age + Educ + ageSquared + Senior:Educ + 
    Age:Exper + ageCubed + Educ:Exper + Exper:ageSquared)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.230710 -0.050695  0.004412  0.051503  0.195887 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       5.478e+00  5.798e-01   9.448 7.45e-15 ***
Age               1.767e-02  3.633e-03   4.864 5.31e-06 ***
Educ              5.875e-02  9.296e-03   6.320 1.20e-08 ***
ageSquared       -3.799e-05  7.299e-06  -5.204 1.36e-06 ***
ageCubed          2.614e-08  4.826e-09   5.416 5.68e-07 ***
Educ:Senior      -3.110e-04  7.697e-05  -4.040 0.000118 ***
Age:Exper         1.358e-05  2.880e-06   4.716 9.46e-06 ***
Educ:Exper       -1.086e-04  4.617e-05  -2.352 0.020996 *  
ageSquared:Exper -1.697e-08  3.658e-09  -4.639 1.27e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.09101 on 84 degrees of freedom
Multiple R-squared:  0.547,	Adjusted R-squared:  0.5039 
F-statistic: 12.68 on 8 and 84 DF,  p-value: 8.856e-12


Call:
lm(formula = logSal ~ Age + Educ + ageSquared + ageCubed + Sex + 
    Educ:Senior + Age:Exper + Educ:Exper + ageSquared:Exper)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.173459 -0.037584  0.004244  0.047305  0.192259 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       5.929e+00  5.105e-01  11.614  < 2e-16 ***
Age               1.480e-02  3.200e-03   4.626 1.36e-05 ***
Educ              4.957e-02  8.253e-03   6.006 4.83e-08 ***
ageSquared       -3.097e-05  6.473e-06  -4.784 7.36e-06 ***
ageCubed          2.108e-08  4.296e-09   4.907 4.56e-06 ***
SexMale           1.115e-01  2.092e-02   5.330 8.29e-07 ***
Educ:Senior      -3.206e-04  6.686e-05  -4.795 7.06e-06 ***
Age:Exper         9.231e-06  2.631e-06   3.509 0.000729 ***
Educ:Exper       -6.919e-05  4.077e-05  -1.697 0.093388 .  
ageSquared:Exper -1.213e-08  3.304e-09  -3.670 0.000427 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.07903 on 83 degrees of freedom
Multiple R-squared:  0.6625,	Adjusted R-squared:  0.6259 
F-statistic:  18.1 on 9 and 83 DF,  p-value: 3.109e-16

Analysis of Variance Table

Model 1: logSal ~ Age + Educ + ageSquared + ageCubed + Sex + Educ:Senior + 
    Age:Exper + Educ:Exper + ageSquared:Exper
Model 2: logSal ~ Age + Educ + ageSquared + ageCubed + Sex + Educ:Senior + 
    Age:Exper + Educ:Exper + ageSquared:Exper + Age:Sex + Educ:Sex + 
    Sex:Senior + Sex:Exper + ageSquared:Sex
  Res.Df     RSS Df Sum of Sq      F Pr(>F)
1     83 0.51839                           
2     78 0.51429  5 0.0041066 0.1246 0.9865

Call:
lm(formula = logSal ~ Age + Educ + ageSquared + ageCubed + Sex + 
    Educ:Senior + Age:Exper + Educ:Exper + ageSquared:Exper)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.173459 -0.037584  0.004244  0.047305  0.192259 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       5.929e+00  5.105e-01  11.614  < 2e-16 ***
Age               1.480e-02  3.200e-03   4.626 1.36e-05 ***
Educ              4.957e-02  8.253e-03   6.006 4.83e-08 ***
ageSquared       -3.097e-05  6.473e-06  -4.784 7.36e-06 ***
ageCubed          2.108e-08  4.296e-09   4.907 4.56e-06 ***
SexMale           1.115e-01  2.092e-02   5.330 8.29e-07 ***
Educ:Senior      -3.206e-04  6.686e-05  -4.795 7.06e-06 ***
Age:Exper         9.231e-06  2.631e-06   3.509 0.000729 ***
Educ:Exper       -6.919e-05  4.077e-05  -1.697 0.093388 .  
ageSquared:Exper -1.213e-08  3.304e-09  -3.670 0.000427 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.07903 on 83 degrees of freedom
Multiple R-squared:  0.6625,	Adjusted R-squared:  0.6259 
F-statistic:  18.1 on 9 and 83 DF,  p-value: 3.109e-16

 SexMale 
1.117974 
           2.5 %   97.5 %
SexMale 1.072404 1.165481

Sleuth3 documentation built on May 2, 2019, 6:41 a.m.