ss: *S*ample *s*ize for a given coefficient and events per...

Description Usage Arguments Details Value Note References Examples

View source: R/ss.R

Description

Sample size for a given coefficient and events per covariate for model

Usage

1
2
3
4
5
6
ss(x, ...)

## S3 method for class 'glm'
ss(x, ..., alpha = 0.05, beta = 0.8,
  coeff = names(stats::coef(x))[2], std = FALSE,
  alternative = c("one.sided", "two.sided"), OR = NULL, Px0 = NULL)

Arguments

x

A regression model with class glm and x$family$family == "binomial".

...

Not used.

alpha

significance level alpha for the null-hypothesis significance test.

beta

power Beta for the null-hypothesis significance test.

coeff

Name of coefficient (variable) in the model to be tested.

std

Standardize the coefficient?
If std=TRUE (the default), a continuous coefficent will be standardized, using the mean xbar and standard deviation SD[x]:

z[x] = (x[i] - xbar) / SD[x]

alternative

The default, alternative="one.sided", checks the null hypothesis with z = 1 - alpha.
If alternative="two.sided", z = 1 - alpha/2 is used instead.

OR

Odds ratio. The size of the change in the probability.

Px0

The probability that x=0.
If not supplied, this is estimated from the data.

Details

Gives the sample size necessary to demonstrate that a coefficient in the model for the given predictor is equal to its given value rather than equal to zero (or, if OR is supplied, the sample size needed to check for such a change in probability).

Also, the number of events per predictor.
This is the smaller value of the outcome y=0 and outcome y=1.

For a continuous coefficient, the calculation uses Bhat, the estimated coefficient from the model, delta:

delta = (1 + (1 + Bhat^2)exp(1.25 * Bhat^2)) / (1 + exp(1 + exp(-0.25 * Bhat^2)))

and P[0], the probability calculated from the intercept term B[0] from the logistic model
glm(x$y ~ coeff, family=binomial)
as P[0] = exp(B[0]) / (1 + exp(B[0])) For a model with one predictor, the calculation is:

n = (1 + 1 * P[0] * delta) * (z[1-alpha] + z[beta] exp((0.25 * Bhat)^2)^2) / P[0] * Bhat^2

For a multivariable model, the value is adjusted by R^2, the correlation of coeff with the other predictors in the model:

n[m] = n / (1 - R^2)

For a binomial coefficient, the calculation uses P[0], the probability given the null hypothesis and P[a], the probability given the alternative hypothesis and and the average probability Pbar = (P[0] + P[a]) /2 The calculation is:

n = (z[1-alpha](2Pbar(1 - Pbar)^0.5) + z[beta](P[0](1 - P[0]) + P[1](1 - P[1]))^0.5)^2 / (P[1] - P[0])^2

An alternative given by Whitemore uses Phat = P(x=0).
The lead term in the equation below is used to correct for large values of Phat:

n = (1 + 2P[0]) * (z[1-alpha]sqrt(1/Phat + 1/(1+Phat)) + z[beta]sqrt(1/Phat + 1/(Phat exp(Bhat))))^2 / (P[0]Bhat)^2

As above these can be adjusted in the multivariable case:

n[m] = n / (1 - R^2)

In this case, Pearsons R^2 correlation is between the fitted values from a logistic regression with coeff as the response and the other predictors as co-variates.
The calculation uses Pbar, the mean probability (mean of the fitted values from the model):

R^2 = (sum(y[i] - Pbar)(P[i] - Pbar))^2 / (sum(y[i] - Pbar)^2 * sum (P[i] - Pbar)^2)

Value

A list of:

ss

Sample size required to show coefficient for predictor is as given in the model rather than the alternative (by default =0).

epc

Events per covariate; should be >10 to make meaningful statements about the coefficients obtained.

Note

The returned list has the additional class of "ss.glm".
The print method for this class does not show the attributes.

References

Whitemore AS (1981). Sample Size for Logistic Regression with Small Response Probability. Journal of the American Statistical Association. 76(373):27-32. JASA (paywall)
JSTOR (free)
http://www.jstor.org/stable/2287036

Hsieh FY (1989). Sample size tables for logistic regression. Statistics in Medicine. 8(7):795-802. Wiley (paywall). statpower (free).

Fleiss J (2003). Statistical methods for rates and proportions. 3rd ed. John Wiley, New York. Wiley (paywall). Google books (free preview).

Peduzzi P, Concato J, Kemper E, Holford T R, Feinstein A R (1996). A simulation study of the number of events per variable in logistic regression analysis. Journal of clinical epidemiology. 49(12):1373-79. JCE (paywall). ResearchGate (free).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
## H&L 2nd ed. Section 8.5.
## Results here are slightly different from the text due to rounding.
data(uis)
with(uis, prop.table(table(DFREE, TREAT), 2))
(g1 <- glm(DFREE ~ TREAT, data=uis, family=binomial))
ss(g1, coeff="TREATlong")
## Pages 340 - 341.
ss(g1, coeff="TREATlong", OR=1.5, Px0=0.5)
## standardize
uis <- within(uis, {
    AGES <- (AGE - 32) / 6
    NDRGTXS <- (NDRGTX - 5) / 5
})
## Page 343.
## results slightly different due to rounding
g1 <- glm(DFREE ~ AGES, data=uis, family=binomial)
ss(g1, coeff="AGES", std=FALSE, OR=1.5)
## Table 8.37. Page 344.
summary(g1 <- glm(DFREE ~ AGES + NDRGTXS + IVHX + RACE + TREAT,
                  data=uis, family=binomial))
## Page 345.
## results slightly different due to rounding
ss(g1, coeff="AGES", std=FALSE, OR=1.5)
ss(g1, coeff="TREATlong", std=FALSE, OR=1.5)

Example output

     TREAT
DFREE     short      long
  yes 0.7854671 0.7027972
  no  0.2145329 0.2972028

Call:  glm(formula = DFREE ~ TREAT, family = binomial, data = uis)

Coefficients:
(Intercept)    TREATlong  
    -1.2978       0.4372  

Degrees of Freedom: 574 Total (i.e. Null);  573 Residual
Null Deviance:	    653.7 
Residual Deviance: 648.6 	AIC: 652.6
              ss
uni     343.4359
uni_alt 404.0000
  epc
1 147
              ss
uni     402.1158
uni_alt 472.0000
  epc
1 147
          ss
uni 235.3889
  epc
1 147

Call:
glm(formula = DFREE ~ AGES + NDRGTXS + IVHX + RACE + TREAT, family = binomial, 
    data = uis)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.3065  -0.8082  -0.6345   1.1596   2.4652  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)   -1.0410     0.2097  -4.964  6.9e-07 ***
AGES           0.3058     0.1039   2.944  0.00324 ** 
NDRGTXS       -0.3160     0.1283  -2.464  0.01374 *  
IVHXprevious  -0.5929     0.2864  -2.070  0.03847 *  
IVHXrecent    -0.7600     0.2490  -3.052  0.00227 ** 
RACEother      0.2081     0.2215   0.940  0.34735    
TREATlong      0.4390     0.1991   2.204  0.02751 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 653.73  on 574  degrees of freedom
Residual deviance: 619.71  on 568  degrees of freedom
AIC: 633.71

Number of Fisher Scoring iterations: 4

            ss
uni   231.4115
multi 272.4161
   epc
1 24.5
                ss
uni       402.1158
uni_alt   472.0000
multi     407.2532
multi_alt 478.0000
   epc
1 24.5

LogisticDx documentation built on May 2, 2019, 6:15 p.m.