glm.sdf: EdSurvey Generalized Linear Models
In EdSurvey: Analysis of NCES Education Survey and Assessment Data

glm.sdf

R Documentation

EdSurvey Generalized Linear Models

Description

Fits a logit or probit that uses weights and variance estimates appropriate for the edsurvey.data.frame, the light.edsurvey.data.frame, or the edsurvey.data.frame.list.

Usage

glm.sdf(formula, family = binomial(link = "logit"), data,
  weightVar = NULL, relevels = list(),
  varMethod=c("jackknife", "Taylor"), jrrIMax = 1,
  dropOmittedLevels = TRUE, defaultConditions = TRUE, recode = NULL,
  returnNumberOfPSU=FALSE, returnVarEstInputs = FALSE,
  omittedLevels = deprecated())

logit.sdf(
  formula,
  data,
  weightVar = NULL,
  relevels = list(),
  varMethod = c("jackknife", "Taylor"),
  jrrIMax = 1,
  dropOmittedLevels = TRUE,
  defaultConditions = TRUE,
  recode = NULL,
  returnNumberOfPSU = FALSE,
  returnVarEstInputs = FALSE,
  omittedLevels = deprecated()
)

probit.sdf(
  formula,
  data,
  weightVar = NULL,
  relevels = list(),
  varMethod = c("jackknife", "Taylor"),
  jrrIMax = 1,
  dropOmittedLevels = TRUE,
  defaultConditions = TRUE,
  recode = NULL,
  returnNumberOfPSU = FALSE,
  returnVarEstInputs = FALSE,
  omittedLevels = deprecated()
)

Arguments

`formula`	a `formula` for the linear model. See `glm`. For logit and probit, we recommend using the `I()` function to define the level used for success. (See Examples.)
`family`	the `glm.sdf` function currently fits only the binomial outcome models, such as logit and probit, although other link functions are available for binomial models. See the `link` argument in the help for `family`.
`data`	an `edsurvey.data.frame`
`weightVar`	character indicating the weight variable to use (see Details). The `weightVar` must be one of the weights for the `edsurvey.data.frame`. If `NULL`, uses the default for the `edsurvey.data.frame`.
`relevels`	a list; used to change the contrasts from the default treatment contrasts to the treatment contrasts with a chosen omitted group. The name of each element should be the variable name, and the value should be the group to be omitted.
`varMethod`	a character set to “jackknife” or “Taylor” that indicates the variance estimation method to be used. See Details.
`jrrIMax`	the `Vjrr` sampling variance term (see Statistical Methods Used in EdSurvey) can be estimated with any positive number of plausible values and is estimated on the lower of the number of available plausible values and `jrrIMax`. When `jrrIMax` is set to `Inf`, all plausible values will be used. Higher values of `jrrIMax` lead to longer computing times and more accurate variance estimates.
`dropOmittedLevels`	a logical value. When set to the default value of `TRUE`, drops those levels of all factor variables that are specified in `edsurvey.data.frame`. Use `print` on an `edsurvey.data.frame` to see the omitted levels.
`defaultConditions`	a logical value. When set to the default value of `TRUE`, uses the default conditions stored in an `edsurvey.data.frame` to subset the data. Use `print` on an `edsurvey.data.frame` to see the default conditions.
`recode`	a list of lists to recode variables. Defaults to `NULL`. Can be set as `recode=` `list(var1=` `list(from=` `c("a",` `"b",` `"c"),` `to="d"))`.
`returnNumberOfPSU`	a logical value set to `TRUE` to return the number of primary sampling units (PSUs)
`returnVarEstInputs`	a logical value set to `TRUE` to return the inputs to the jackknife and imputation variance estimates, which allow for the computation of covariances between estimates.
`omittedLevels`	this argument is deprecated. Use `dropOmittedLevels`

Details

This function implements an estimator that correctly handles left-hand side variables that are logical, allows for survey sampling weights, and estimates variances using the jackknife replication or Taylor series. The vignette titled Statistical Methods Used in EdSurvey describes estimation of the reported statistics and how it depends on varMethod.

The coefficients are estimated using the sample weights according to the section “Estimation of Weighted Means When Plausible Values Are Not Present” or the section “Estimation of Weighted Means When Plausible Values Are Present,” depending on if there are assessment variables or variables with plausible values in them.

How the standard errors of the coefficients are estimated depends on the presence of plausible values (assessment variables), But once it is obtained, the t statistic is given by

t=\frac{\hat{\beta}}{\sqrt{\mathrm{var}(\hat{\beta})}}

where \hat{\beta} is the estimated coefficient and \mathrm{var}(\hat{\beta}) is its variance of that estimate.

logit.sdf and probit.sdf are included for convenience only; they give the same results as a call to glm.sdf with the binomial family and the link function named in the function call (logit or probit). By default, glm fits a logistic regression when family is not set, so the two are expected to give the same results in that case. Other types of generalized linear models are not supported.

Variance estimation of coefficients

All variance estimation methods are shown in the vignette titled Statistical Methods Used in EdSurvey. When the predicted value does not have plausible values and varMethod is set to jackknife, the variance of the coefficients is estimated according to the section “Estimation of Standard Errors of Weighted Means When Plausible Values Are Not Present, Using the Jackknife Method.”

When plausible values are present and varMethod is set to jackknife, the variance of the coefficients is estimated according to the section “Estimation of Standard Errors of Weighted Means When Plausible Values Are Present, Using the Jackknife Method.”

When the predicted value does not have plausible values and varMethod is set to Taylor, the variance of the coefficients is estimated according to the section “Estimation of Standard Errors of Weighted Means When Plausible Values Are Not Present, Using the Taylor Series Method.”

When plausible values are present and varMethod is set to Taylor, the variance of the coefficients is estimated according to the section “Estimation of Standard Errors of Weighted Means When Plausible Values Are Present, Using the Taylor Series Method.”

Value

An edsurveyGlm with the following elements:

`call`	the function call
`formula`	the formula used to fit the model
`coef`	the estimates of the coefficients
`se`	the standard error estimates of the coefficients
`Vimp`	the estimated variance caused by uncertainty in the scores (plausible value variables)
`Vjrr`	the estimated variance from sampling
`M`	the number of plausible values
`nPSU`	the number of PSUs used in the calculation
`varm`	the variance estimates under the various plausible values
`coefm`	the values of the coefficients under the various plausible values
`coefmat`	the coefficient matrix (typically produced by the summary of a model)
`weight`	the name of the weight variable
`npv`	the number of plausible values
`njk`	the number of the jackknife replicates used
`varMethod`	always `jackknife`
`varEstInputs`	when `returnVarEstInputs` is `TRUE`, this element is returned. These are used for calculating covariances with `varEstToCov`.

Testing

Of the common hypothesis tests for joint parameter testing, only the Wald test is widely used with plausible values and sample weights. As such, it replaces, if imperfectly, the Akaike Information Criteria (AIC), the likelihood ratio test, chi-squared, and analysis of variance (ANOVA, including F-tests). See waldTest or the vignette titled Methods and Overview of Using EdSurvey for Running Wald Tests.

Author(s)

Paul Bailey

Examples

## Not run: 
# read in the example data (generated, not real student data)
sdf <- readNAEP(path=system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))

# by default uses the jackknife variance method using replicate weights
table(sdf$b013801)
logit1 <- logit.sdf(formula=I(b013801 %in% c("26-100", ">100")) ~ dsex + b017451, data=sdf)
# use summary to get detailed results
summary(logit1)

# Taylor series variance estimation
logit1t <- logit.sdf(formula=I(b013801 %in% c("26-100", ">100")) ~ dsex + b017451, data=sdf,
                     varMethod="Taylor")
summary(logit1t)

logit2 <- logit.sdf(formula=I(composite >= 300) ~ dsex + b013801, data=sdf)
summary(logit2)

logit3 <- glm.sdf(formula=I(composite >= 300) ~ dsex + b013801, data=sdf, 
                  family=quasibinomial(link="logit"))

# Wald test for joint hypothesis that all coefficients in b013801 are zero
waldTest(model=logit3, coefficients="b013801")

summary(logit3)

## End(Not run)

EdSurvey documentation built on June 27, 2024, 5:10 p.m.