xgbp: Extreme Gradient Boosting with Poststratification

View source: R/xgbp.R

xgbpR Documentation

Extreme Gradient Boosting with Poststratification

Description

The package's main function, xgbp poststratify a survey response from a sample using Extreme Gradient Boosting (XGB). Dependent variables can be both binomial or multinomial and resulting estimates can be aggregated by the full sample or by any group used in the estimation.

Usage

xgbp(
  survey,
  census,
  census_count,
  ...,
  dep_var = NULL,
  seed = 44,
  tune = FALSE,
  params = NULL,
  nrounds = 80,
  nrounds_final = 500,
  n_iter = 25,
  nthread = 1,
  verbose = TRUE
)

Arguments

survey

A survey sample containing the variables to use in the poststratification. Must be a data.frame or a tibble

census

Census data to use in the poststratification. Must be a data.frame or tibble containing the same variables, with the same categories, as the survey object

census_count

numeric variable in the census object indicanting the raw number or proportion of individuals in a given stratum

...

Individual and group level covariates used in the poststratification. All variables must be included in the survey and in the census and passed unquoted to the function call

dep_var

Dependent variable. Must be character or factor

seed

A seed for replication. Defaults to 44

tune

Should the XGBP tune the parameters with randomized grid search? Defaults to FALSE, in which case params argument is used

params

A list of parameters to be passed to xgboost function

nrounds

Number of trees (rounds) used in to train the model. Defaults to 80

nrounds_final

Number of trees (rounds) used in to train the final model. Defaults to 500

n_iter

When tune = TRUE, this indicates how many samples to draw during gridsearch to use. Defaults to 25 (increase this number in sensitive surveys).

nthread

Number of htreads used in the computation. Defaults to 1, but users are encourage to increase this number to speed up computations (the limit is the actual number of threads available at your computer)

verbose

Should the function report messages along the estimation? Defaults to TRUE

Value

A list of class xgbp with the following items

  • estimates – A tibble containing raw estimates by strata

  • model – The trained xgboost model

  • data – GBP datamatrix used to train the model

  • nrounds – Number of rounds used to train the model

  • nrounds_final – Number of rounds used to train the final model

  • census – Census data used to poststratify results

  • census_count – Variable in the census object indicanting the raw number or proportion of individuals in a given stratum

  • covars_matrix – GBP matrix with covars used to train the model

  • dep_var – Dependent variable (target)

  • seed – Seed used to reproduce results

Examples

## Not run: 
# General use case
ps <- xgbp(survey, census, var1, var2, dep_var = Y)

## End(Not run)


meirelesff/xgbp documentation built on Sept. 24, 2022, 1:48 p.m.