modifiedGreg: Compute a modified generalized regression estimator

View source: R/modifiedGreg.R

modifiedGregR Documentation

Compute a modified generalized regression estimator

Description

Calculates a modified generalized regression estimator for a finite population mean/proportion or total based on sample data collected from a complex sampling design and auxiliary population data.

Usage

modifiedGreg(
  y,
  xsample,
  xpop,
  domains,
  pi = NULL,
  pi2 = NULL,
  datatype = "raw",
  model = "linear",
  var_est = F,
  var_method = "LinHB",
  modelselect = FALSE,
  lambda = "lambda.min",
  domain_col_name = NULL,
  estimation_domains = NULL,
  N = NULL,
  B = 1000,
  fpc = TRUE,
  messages = TRUE
)

Arguments

y

A vector of the response values from the sample

xsample

A data frame of the auxiliary data in the sample.

xpop

A data frame of population level auxiliary information. It must contain all of the names from xsample. If datatype = "raw", must contain unit level data. If datatype = "totals" or "means", then contains one row of aggregated, population totals or means for the auxiliary data and must include a column labeled N with the population sizes for each domain. Default is "raw".

domains

A vector of the specific domain that each row of xsample belongs to.

pi

First order inclusion probabilities.

pi2

Second order inclusion probabilities.

datatype

A string that specifies the form of population auxiliary data. The possible values are "raw", "totals" or "means" for whether the user is providing population data at the unit level, aggregated to totals, or aggregated to means. Default is "raw".

model

A string that specifies the regression model to utilize. Options are "linear" or "logistic".

var_est

A logical value that specifies whether variance estimation should be performed.

var_method

A string that specifies the variance method to utilize.

modelselect

A logical for whether or not to run lasso regression first and then fit the model using only the predictors with non-zero lasso coefficients. Default is FALSE.

lambda

A string specifying how to tune the lasso hyper-parameter. Only used if modelselect = TRUE and defaults to "lambda.min". The possible values are "lambda.min", which is the lambda value associated with the minimum cross validation error or "lambda.1se", which is the lambda value associated with a cross validation error that is one standard error away from the minimum, resulting in a smaller model.

domain_col_name

A string that specifies the name of the column that contains the domain values in xpop.

estimation_domains

A vector of domain values over which to produce estimates. If NULL, estimation will be performed over all of the domains included in xpop.

N

The total population size.

B

The number of bootstrap iterations to perform when var_method = "bootstrapSRS"

fpc

Default to TRUE, logical for whether or not the variance calculation should include a finite population correction when calculating the "LinHTSRS" or the "SRSbootstrap" variance estimator.

messages

A logical indicating whether to output the messages internal to mase. Default is TRUE.

References

\insertRef

raomolina15mase

Examples

library(dplyr)
data(IdahoPop)
data(IdahoSamp)

modifiedGreg(y = IdahoSamp$BA_TPA_ADJ,
             xsample = IdahoSamp[c("tcc", "elev")],
             xpop = IdahoPop[c("COUNTYFIPS","tcc", "elev", "npixels")] |> rename(N = npixels),
             domains = IdahoSamp$COUNTYFIPS,
             datatype = "means",
             N = sum(IdahoPop$npixels),
             var_est = TRUE)

Swarthmore-Statistics/mase documentation built on March 5, 2024, 6:16 a.m.