rq.sdf: EdSurvey Quantile Regression Models

View source: R/rq.sdf.R

rq.sdfR Documentation

EdSurvey Quantile Regression Models

Description

Fits a quantile regression model that uses weights and variance estimates appropriate for the data.

Usage

rq.sdf(
  formula,
  data,
  tau = 0.5,
  weightVar = NULL,
  relevels = list(),
  jrrIMax = 1,
  dropOmittedLevels = TRUE,
  defaultConditions = TRUE,
  recode = NULL,
  returnNumberOfPSU = FALSE,
  omittedLevels = deprecated(),
  ...
)

Arguments

formula

a formula for the quantile regression model. See rq. If y is left blank, the default subject scale or subscale variable will be used. (You can find the default using showPlausibleValues.) If y is a variable for a subject scale or subscale (one of the names shown by showPlausibleValues), then that subject scale or subscale is used.

data

an edsurvey.data.frame, a light.edsurvey.data.frame, or an edsurvey.data.frame.list

tau

the quantile to be estimated. The value could be set between 0 and 1 with a default of 0.5.

weightVar

a character indicating the weight variable to use. The weightVar must be one of the weights for the edsurvey.data.frame. If NULL, it uses the default for the edsurvey.data.frame.

relevels

a list. Used to change the contrasts from the default treatment contrasts to the treatment contrasts with a chosen omitted group (the reference group). The name of each element should be the variable name, and the value should be the group to be omitted (the reference group).

jrrIMax

when using the jackknife variance estimation method, the default estimation option, jrrIMax=1, uses the sampling variance from the first plausible value as the component for sampling variance estimation. The V_{jrr} term can be estimated with any number of plausible values, and values larger than the number of plausible values on the survey (including Inf) will result in all plausible values being used. Higher values of jrrIMax lead to longer computing times and more accurate variance estimates.

dropOmittedLevels

a logical value. When set to the default value of TRUE, drops those levels of all factor variables that are specified in an edsurvey.data.frame. Use print on an edsurvey.data.frame to see the omitted levels.

defaultConditions

a logical value. When set to the default value of TRUE, uses the default conditions stored in an edsurvey.data.frame to subset the data. Use print on an edsurvey.data.frame to see the default conditions.

recode

a list of lists to recode variables. Defaults to NULL. Can be set as recode=list(var1 = list(from= c("a", "b", "c"), to= "d")).

returnNumberOfPSU

a logical value set to TRUE to return the number of primary sampling units (PSUs)

omittedLevels

this argument is deprecated. Use dropOmittedLevels

...

additional parameters passed from rq

Details

The function computes an estimate on the tau-th conditional quantile function of the response, given the covariates, as specified by the formula argument. Like lm.sdf(), the function presumes a linear specification for the quantile regression model (i.e., that the formula defines a model that is linear in parameters). Unlike lm.sdf(), the jackknife is the only applicable variance estimation method used by the function.

For further details on quantile regression models and how they are implemented in R, see Koenker and Bassett (1978), Koenker (2005), and the vignette from the quantreg package— accessible by vignette("rq",package="quantreg")—on which this function is built.

For further details on how left-hand side variables, survey sampling weights, and estimated variances are correctly handled, see lm.sdf or the vignette titled Statistical Methods Used in EdSurvey.

Value

An edsurvey.rq with the following elements:

call

the function call

formula

the formula used to fit the model

tau

the quantile to be estimated

coef

the estimates of the coefficients

se

the standard error estimates of the coefficients

Vimp

the estimated variance from uncertainty in the scores (plausible value variables)

Vjrr

the estimated variance from sampling

M

the number of plausible values

varm

the variance estimates under the various plausible values

coefm

the values of the coefficients under the various plausible values

coefmat

the coefficient matrix (typically produced by the summary of a model)

weight

the name of the weight variable

npv

the number of plausible values

njk

the number of the jackknife replicates used; set to NA when Taylor series variance estimates are used

rho

the mean value of the objective function across the plausible values

Author(s)

Trang Nguyen, Paul Bailey, and Yuqi Liao

References

Binder, D. A. (1983). On the variances of asymptotically normal estimators from complex surveys. International Statistical Review, 51(3), 279–292.

Johnson, E. G., & Rust, K. F. (1992). Population inferences and variance estimation for NAEP data. Journal of Education Statistics, 17(2), 175–190.

Koenker, R. W., & Bassett, G. W. (1978). Regression quantiles, Econometrica, 46, 33–50.

Koenker, R. W. (2005). Quantile regression. Cambridge, UK: Cambridge University Press.

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, NY: Wiley.

See Also

rq

Examples

## Not run: 
# read in the example data (generated, not real student data)
sdf <- readNAEP(system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))

# conduct quantile regression at a given tau value (by default, tau is set to be 0.5) 
rq1 <- rq.sdf(composite ~ dsex + b017451, data=sdf, tau = 0.8)
summary(rq1)

## End(Not run)

EdSurvey documentation built on Nov. 2, 2023, 6:25 p.m.