prepare_data: Prepare choice data for estimation
In loelschlaeger/RprobitB: Bayesian Probit Choice Modeling

prepare_data

R Documentation

Prepare choice data for estimation

Description

This function prepares choice data for estimation.

Usage

prepare_data(
  form,
  choice_data,
  re = NULL,
  alternatives = NULL,
  ordered = FALSE,
  ranked = FALSE,
  base = NULL,
  id = "id",
  idc = NULL,
  standardize = NULL,
  impute = "complete_cases"
)

Arguments

`form`	A `formula` object that is used to specify the model equation. The structure is `choice ~ A \| B \| C`, where `choice` is the name of the dependent variable (the choices), `A` are names of alternative and choice situation specific covariates with a coefficient that is constant across alternatives, `B` are names of choice situation specific covariates with alternative specific coefficients, and `C` are names of alternative and choice situation specific covariates with alternative specific coefficients. Multiple covariates (of one type) are separated by a `+` sign. By default, alternative specific constants (ASCs) are added to the model. They can be removed by adding `+0` in the second spot. In the ordered probit model (`ordered = TRUE`), the `formula` object has the simple structure `choice ~ A`. ASCs are not estimated.
`choice_data`	A `data.frame` of choice data in wide format, i.e. each row represents one choice occasion.
`re`	A character (vector) of covariates of `form` with random effects. If `re = NULL` (the default), there are no random effects. To have random effects for the ASCs, include `"ASC"` in `re`.
`alternatives`	A character vector with the names of the choice alternatives. If not specified, the choice set is defined by the observed choices. If `ordered = TRUE`, `alternatives` is assumed to be specified with the alternatives ordered from worst to best.
`ordered`	A boolean, `FALSE` per default. If `TRUE`, the choice set `alternatives` is assumed to be ordered from worst to best.
`ranked`	TBA
`base`	A character, the name of the base alternative for covariates that are not alternative specific (i.e. type 2 covariates and ASCs). Ignored and set to `NULL` if the model has no alternative specific covariates (e.g. in the ordered probit model). Per default, `base` is the last element of `alternatives`.
`id`	A character, the name of the column in `choice_data` that contains unique identifier for each decision maker. The default is `"id"`.
`idc`	A character, the name of the column in `choice_data` that contains unique identifier for each choice situation of each decision maker. Per default, these identifier are generated by the order of appearance.
`standardize`	A character vector of names of covariates that get standardized. Covariates of type 1 or 3 have to be addressed by `<covariate>_<alternative>`. If `standardize = "all"`, all covariates get standardized.
`impute`	A character that specifies how to handle missing covariate entries in `choice_data`, one of: `"complete_cases"`, removes all rows containing missing covariate entries (the default), `"zero"`, replaces missing covariate entries by zero (only for numeric columns), `"mean"`, imputes missing covariate entries by the mean (only for numeric columns).

Details

Requirements for the data.frame choice_data:

It must contain a column named id which contains unique identifier for each decision maker.
It can contain a column named idc which contains unique identifier for each choice situation of each decision maker. If this information is missing, these identifier are generated automatically by the appearance of the choices in the data set.
It can contain a column named choice with the observed choices, where choice must match the name of the dependent variable in form. Such a column is required for model fitting but not for prediction.
It must contain a numeric column named p_j for each alternative specific covariate p in form and each choice alternative j in alternatives.
It must contain a numeric column named q for each covariate q in form that is constant across alternatives.

In the ordered case (ordered = TRUE), the column choice must contain the full ranking of the alternatives in each choice occasion as a character, where the alternatives are separated by commas, see the examples.

See the vignette on choice data for more details.

Value

An object of class RprobitB_data.

Examples

data <- prepare_data(
  form = choice ~ price + time + comfort + change | 0,
  choice_data = train_choice,
  re = c("price", "time"),
  id = "deciderID",
  idc = "occasionID",
  standardize = c("price", "time")
)

### ranked case
choice_data <- data.frame(
  "id" = 1:3, "choice" = c("A,B,C", "A,C,B", "B,C,A"), "cov" = 1
)
data <- prepare_data(
  form = choice ~ 0 | cov + 0,
  choice_data = choice_data,
  ranked = TRUE
)

loelschlaeger/RprobitB documentation built on Oct. 15, 2024, 11:08 a.m.