Description Usage Arguments Details Value Note References Examples
Sample size for a given coefficient and events per covariate for model
1 2 3 4 5 6 |
x |
A regression model with class |
... |
Not used. |
alpha |
significance level alpha for the null-hypothesis significance test. |
beta |
power Beta for the null-hypothesis significance test. |
coeff |
Name of coefficient (variable) in the model to be tested. |
std |
Standardize the coefficient?
z[x] = (x[i] - xbar) / SD[x] |
alternative |
The default, |
OR |
Odds ratio. The size of the change in the probability. |
Px0 |
The probability that x=0.
|
Gives the sample size necessary to demonstrate that a coefficient
in the model for the
given predictor is equal to its given value
rather than equal to zero (or, if OR
is supplied,
the sample size needed to check for such a change in probability).
Also, the number of events per predictor.
This is the smaller value of the outcome y=0 and outcome y=1.
For a continuous coefficient, the calculation uses
Bhat, the estimated coefficient from the model,
delta:
delta = (1 + (1 + Bhat^2)exp(1.25 * Bhat^2)) / (1 + exp(1 + exp(-0.25 * Bhat^2)))
and P[0], the probability calculated from the intercept term
B[0] from the logistic model
glm(x$y ~ coeff, family=binomial)
as
P[0] = exp(B[0]) / (1 + exp(B[0]))
For a model with one predictor, the calculation is:
n = (1 + 1 * P[0] * delta) * (z[1-alpha] + z[beta] exp((0.25 * Bhat)^2)^2) / P[0] * Bhat^2
For a multivariable model, the value is adjusted by R^2, the correlation
of coeff
with the other predictors in the model:
n[m] = n / (1 - R^2)
For a binomial coefficient, the calculation uses P[0], the probability given the null hypothesis and P[a], the probability given the alternative hypothesis and and the average probability Pbar = (P[0] + P[a]) /2 The calculation is:
n = (z[1-alpha](2Pbar(1 - Pbar)^0.5) + z[beta](P[0](1 - P[0]) + P[1](1 - P[1]))^0.5)^2 / (P[1] - P[0])^2
An alternative given by Whitemore uses Phat = P(x=0).
The lead term in the equation below is used to correct for
large values of Phat:
n = (1 + 2P[0]) * (z[1-alpha]sqrt(1/Phat + 1/(1+Phat)) + z[beta]sqrt(1/Phat + 1/(Phat exp(Bhat))))^2 / (P[0]Bhat)^2
As above these can be adjusted in the multivariable case:
n[m] = n / (1 - R^2)
In this case, Pearsons R^2 correlation is between the
fitted values from a logistic regression with coeff
as the response
and the other predictors as co-variates.
The calculation uses Pbar, the mean probability (mean of the
fitted values from the model):
R^2 = (sum(y[i] - Pbar)(P[i] - Pbar))^2 / (sum(y[i] - Pbar)^2 * sum (P[i] - Pbar)^2)
A list of:
ss |
Sample size required to show coefficient for predictor is as given in the model rather than the alternative (by default =0). |
epc |
Events per covariate; should be >10 to make meaningful statements about the coefficients obtained. |
The returned list
has the additional
class
of "ss.glm"
.
The print
method for this class
does not
show the attributes.
Whitemore AS (1981).
Sample Size for Logistic Regression with Small Response Probability.
Journal of the American Statistical Association. 76(373):27-32.
JASA (paywall)
JSTOR (free)
http://www.jstor.org/stable/2287036
Hsieh FY (1989). Sample size tables for logistic regression. Statistics in Medicine. 8(7):795-802. Wiley (paywall). statpower (free).
Fleiss J (2003). Statistical methods for rates and proportions. 3rd ed. John Wiley, New York. Wiley (paywall). Google books (free preview).
Peduzzi P, Concato J, Kemper E, Holford T R, Feinstein A R (1996). A simulation study of the number of events per variable in logistic regression analysis. Journal of clinical epidemiology. 49(12):1373-79. JCE (paywall). ResearchGate (free).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | ## H&L 2nd ed. Section 8.5.
## Results here are slightly different from the text due to rounding.
data(uis)
with(uis, prop.table(table(DFREE, TREAT), 2))
(g1 <- glm(DFREE ~ TREAT, data=uis, family=binomial))
ss(g1, coeff="TREATlong")
## Pages 340 - 341.
ss(g1, coeff="TREATlong", OR=1.5, Px0=0.5)
## standardize
uis <- within(uis, {
AGES <- (AGE - 32) / 6
NDRGTXS <- (NDRGTX - 5) / 5
})
## Page 343.
## results slightly different due to rounding
g1 <- glm(DFREE ~ AGES, data=uis, family=binomial)
ss(g1, coeff="AGES", std=FALSE, OR=1.5)
## Table 8.37. Page 344.
summary(g1 <- glm(DFREE ~ AGES + NDRGTXS + IVHX + RACE + TREAT,
data=uis, family=binomial))
## Page 345.
## results slightly different due to rounding
ss(g1, coeff="AGES", std=FALSE, OR=1.5)
ss(g1, coeff="TREATlong", std=FALSE, OR=1.5)
|
TREAT
DFREE short long
yes 0.7854671 0.7027972
no 0.2145329 0.2972028
Call: glm(formula = DFREE ~ TREAT, family = binomial, data = uis)
Coefficients:
(Intercept) TREATlong
-1.2978 0.4372
Degrees of Freedom: 574 Total (i.e. Null); 573 Residual
Null Deviance: 653.7
Residual Deviance: 648.6 AIC: 652.6
ss
uni 343.4359
uni_alt 404.0000
epc
1 147
ss
uni 402.1158
uni_alt 472.0000
epc
1 147
ss
uni 235.3889
epc
1 147
Call:
glm(formula = DFREE ~ AGES + NDRGTXS + IVHX + RACE + TREAT, family = binomial,
data = uis)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.3065 -0.8082 -0.6345 1.1596 2.4652
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.0410 0.2097 -4.964 6.9e-07 ***
AGES 0.3058 0.1039 2.944 0.00324 **
NDRGTXS -0.3160 0.1283 -2.464 0.01374 *
IVHXprevious -0.5929 0.2864 -2.070 0.03847 *
IVHXrecent -0.7600 0.2490 -3.052 0.00227 **
RACEother 0.2081 0.2215 0.940 0.34735
TREATlong 0.4390 0.1991 2.204 0.02751 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 653.73 on 574 degrees of freedom
Residual deviance: 619.71 on 568 degrees of freedom
AIC: 633.71
Number of Fisher Scoring iterations: 4
ss
uni 231.4115
multi 272.4161
epc
1 24.5
ss
uni 402.1158
uni_alt 472.0000
multi 407.2532
multi_alt 478.0000
epc
1 24.5
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.