ss: Sample size for a given coefficient and events per...
In dardisco/LogisticDx: Diagnostic Tests for Logistic Regression Models

Description Usage Arguments Details Value Note References Examples

Sample size for a given coefficient and events per covariate for model

ss(x, ...)

## S3 method for class 'glm'
ss(
  x,
  ...,
  alpha = 0.05,
  beta = 0.8,
  coeff = names(stats::coef(x))[2],
  std = FALSE,
  alternative = c("one.sided", "two.sided"),
  OR = NULL,
  Px0 = NULL
)

`x`	A regression model with class `glm` and `x$family$family == "binomial"`.
`...`	Not used.
`alpha`	significance level alpha for the null-hypothesis significance test.
`beta`	power Beta for the null-hypothesis significance test.
`coeff`	Name of coefficient (variable) in the model to be tested.
`std`	Standardize the coefficient? If `std=TRUE` (the default), a continuous coefficent will be standardized, using the mean xbar and standard deviation SD[x]: z[x] = (x[i] - xbar) / SD[x]
`alternative`	The default, `alternative="one.sided"`, checks the null hypothesis with `z = 1 - alpha`. If `alternative="two.sided"`, `z = 1 - alpha/2` is used instead.
`OR`	Odds ratio. The size of the change in the probability.
`Px0`	The probability that x=0. If not supplied, this is estimated from the data.

Gives the sample size necessary to demonstrate that a coefficient in the model for the given predictor is equal to its given value rather than equal to zero (or, if OR is supplied, the sample size needed to check for such a change in probability).

Also, the number of events per predictor.
This is the smaller value of the outcome y=0 and outcome y=1.

For a continuous coefficient, the calculation uses Bhat, the estimated coefficient from the model, delta:

delta = (1 + (1 + Bhat^2)exp(1.25 * Bhat^2)) / (1 + exp(1 + exp(-0.25 * Bhat^2)))

and P[0], the probability calculated from the intercept term B[0] from the logistic model
glm(x$y ~ coeff, family=binomial)
as P[0] = exp(B[0]) / (1 + exp(B[0])) For a model with one predictor, the calculation is:

n = (1 + 1 * P[0] * delta) * (z[1-alpha] + z[beta] exp((0.25 * Bhat)^2)^2) / P[0] * Bhat^2

For a multivariable model, the value is adjusted by R^2, the correlation of coeff with the other predictors in the model:

n[m] = n / (1 - R^2)

For a binomial coefficient, the calculation uses P[0], the probability given the null hypothesis and P[a], the probability given the alternative hypothesis and and the average probability Pbar = (P[0] + P[a]) /2 The calculation is:

n = (z[1-alpha](2Pbar(1 - Pbar)^0.5) + z[beta](P[0](1 - P[0]) + P[1](1 - P[1]))^0.5)^2 / (P[1] - P[0])^2

An alternative given by Whitemore uses Phat = P(x=0).
The lead term in the equation below is used to correct for large values of Phat:

n = (1 + 2P[0]) * (z[1-alpha]sqrt(1/Phat + 1/(1+Phat)) + z[beta]sqrt(1/Phat + 1/(Phat exp(Bhat))))^2 / (P[0]Bhat)^2

As above these can be adjusted in the multivariable case:

n[m] = n / (1 - R^2)

In this case, Pearsons R^2 correlation is between the fitted values from a logistic regression with coeff as the response and the other predictors as co-variates.
The calculation uses Pbar, the mean probability (mean of the fitted values from the model):

R^2 = (sum(y[i] - Pbar)(P[i] - Pbar))^2 / (sum(y[i] - Pbar)^2 * sum (P[i] - Pbar)^2)

A list of:

`ss`	Sample size required to show coefficient for predictor is as given in the model rather than the alternative (by default =0).
`epc`	Events per covariate; should be >10 to make meaningful statements about the coefficients obtained.

The returned list has the additional class of "ss.glm".
The print method for this class does not show the attributes.

Whitemore AS (1981). Sample Size for Logistic Regression with Small Response Probability. Journal of the American Statistical Association. 76(373):27-32. doi: 10.2307/2287036 Also available at JSTOR at https://www.jstor.org/stable/2287036

Hsieh FY (1989). Sample size tables for logistic regression. Statistics in Medicine. 8(7):795-802. doi: 10.1002/sim.4780080704 Also available at statpower (free).

Fleiss J (2003). Statistical methods for rates and proportions. 3rd ed. John Wiley, New York. doi: 10.1002/0471445428 Also available at Google books (free preview).

Peduzzi P, Concato J, Kemper E, Holford T R, Feinstein A R (1996). A simulation study of the number of events per variable in logistic regression analysis. Journal of Clinical Epidemiology. 49(12):1373-79. doi: 10.1016/S0895-4356(96)00236-3

## H&L 2nd ed. Section 8.5.
## Results here are slightly different from the text due to rounding.
data(uis)
with(uis, prop.table(table(DFREE, TREAT), 2))
(g1 <- glm(DFREE ~ TREAT, data=uis, family=binomial))
ss(g1, coeff="TREATlong")
## Pages 340 - 341.
ss(g1, coeff="TREATlong", OR=1.5, Px0=0.5)
## standardize
uis <- within(uis, {
    AGES <- (AGE - 32) / 6
    NDRGTXS <- (NDRGTX - 5) / 5
})
## H&L 2nd ed. Section 8.5. Page 343.
## results slightly different due to rounding
g1 <- glm(DFREE ~ AGES, data=uis, family=binomial) 
ss(g1, coeff="AGES", std=FALSE, OR=1.5)
## H&L 2nd ed. Section 8.5. Table 8.37. Page 344.
summary(g1 <- glm(DFREE ~ AGES + NDRGTXS + IVHX + RACE + TREAT,
                  data=uis, family=binomial))
## H&L 2nd ed. Section 8.5. Page 345.
## results slightly different due to rounding
ss(g1, coeff="AGES", std=FALSE, OR=1.5)
ss(g1, coeff="TREATlong", std=FALSE, OR=1.5)