predict.msel: Predict method for msel function
In switchSelection: Endogenous Switching and Sample Selection Regression Models

predict.msel

R Documentation

Predict method for msel function

Description

Predicted values based on the object of class 'msel'.

Usage

## S3 method for class 'msel'
predict(
  object,
  ...,
  newdata = NULL,
  given_ind = numeric(),
  group = NA,
  group2 = NA,
  group3 = NA,
  type = ifelse(any(is.na(group2)), "prob", "val"),
  me = NULL,
  eps = NULL,
  control = list(),
  test = FALSE,
  exogenous = NULL
)

Arguments

`object`	an object of class "msel".
`...`	further arguments (currently ignored)
`newdata`	an optional data frame in which to look for variables with which to predict. If omitted, the original data frame used. This data frame should contain values of dependent variables even if they are not actually needed for prediction (simply assign them with 0 values).
`given_ind`	a numeric vector of indexes of conditioned components.
`group`	a numeric vector which i-th element represents a value of the i-th dependent variable. If this value equals -1 then this component will be ignored (useful for estimation of marginal probabilities).
`group2`	a numeric vector which i-th element represents a value of the i-th dependent variable of the continuous equation. If this value equals -1 then this component will be ignored.
`group3`	an integer representing the index of the alternative of the multinomial equation. If this value equals -1 then this component will be ignored.
`type`	a string representing a type of the prediction. See 'Details' for more information.
`me`	a string representing the name of the variable for which marginal effect should be estimated. See 'Details' for more information.
`eps`	a numeric vector of length 1 or 2 used for calculation of the marginal effects. See 'Details' for more information.
`control`	a list of additional arguments. Currently is not intended for the users.
`test`	a logical, function or integer. If `test = TRUE` then the output of the function is supplied to `test_msel` before return to perform a t-test. If `test` is a function it will be applied to the output of the `predict` before `test_msel` is called. If `test` is an integer then `test_msel` will be applied only to the `test`-th column of the output.
`exogenous`	a list such that `exogenous[[i]]` represents the value (or a vector of values of the same size as `nrow(newdata)`) which will be exogenously assigned to the variable `names(exogenous)[[i]]` in `newdata` i.e., `newdata[, names(exogenous)[i]] <- exogenous[[i]]`. If `newdata` is `NULL` and `exogenous` is not `NULL` then `newdata` is set to `object$data`. This argument is especially useful for the casual inference when some endogenous (dependent) variables should be exogenously assigned with some values i.e., in the right hand side of the `formula`, `formula2` and `formula3`. The purpose of `exogeneous` argument is just a convenience so equivalently it is possible to exogenously provide the values to variables via the `newdata` argument.

Details

See 'Examples' section of msel for the examples of this function application.

Probabilities of the multivariate ordinal equations

If type = "prob" then the function returns a joint probability that the ordinal outcomes will have values assigned in group. To calculate marginal probabilities set unnecessary group values to -1.

To estimate conditional probabilities provide indexes of the conditioned outcomes through the given_ind argument.

For example, if z_{1i}, z_{2i} and z_{3i} are the ordinal outcomes then to estimate P(z_{1i}=2 | z_{3i} = 0, w_{1i}, w_{2i}, w_{3i}) set given_ind = 3 and groups = c(2, -1, 0).

Linear predictors (indexes) of the multivariate ordinal equations

If type = "li" or type = "lp" then the function returns a matrix which columns are linear predictors (indexes) of the corresponding equations. If group[j] = -1 then linear predictors (indexes) associated with the j-th ordinal equation will be omitted from the output.

For example, if group = c(0, -1, 1) then the function returns a matrix which first column is w_{1i}\hat{\gamma}_{1} and the second column is w_{3i}\hat{\gamma}_{3}.

Standard deviations of the multivariate ordinal equations

If type = "sd" then the function returns a matrix which columns are the estimates of the standard deviations of the random errors for the corresponding equations. If group[j] = -1 then the standard deviations associated with the j-th ordinal equation will be omitted from the output.

For example, if group = c(0, -1, 1) then the function returns a matrix which first column is \hat{\sigma}_{1i}^{*} and the second column is \hat{\sigma}_{3i}^{*}.

Predictions of the continuous outcomes

If type = "val" then the function returns the predictions of the conditional (on group) expectation of the continuous outcomes in the regimes determined by the group2 argument. To predict unconditional expectations set group to a vector of -1 values.

For example, suppose that there is a single continuous equation y_{i} and two ordinal equations z_{1i} and z_{2i}. To estimate E(y_{2i}|x_{i}) set group = c(-1, -1) and group2 = 2. To estimate E(y_{1i}|x_{i}, z_{1i} = 2, z_{2i} = 0) set group = c(2, 0) and group2 = 1. To estimate E(y_{0i}|x_{i}, z_{2i} = 1) set group = c(-1, 1) and group2 = 0.

Suppose that there are two continuous y_{i}^{(1)}, y_{i}^{(2)} and two ordinal z_{1i}, z_{2i} equations. If group2 = c(1, 3) and group = c(3, 0) then the function returns a matrix which first column are the estimates of E(y_{1i}^{(1)}|z_{1i} = 3, z_{2i} = 0, x_{i}^{(1)}) and the second column are the estimates of E(y_{3i}^{(2)}|z_{1i} = 3, z_{2i} = 0, x_{i}^{(2)}).

Selectivity terms

If type = "lambda" then the function returns a matrix which j-th column is a numeric vector of estimates of the selectivity terms \lambda_{ji} associated with the ordinal equations. Similarly if type = "lambda_mn" then the function returns a numeric matrix with the selectivity terms of the multinomial equations.

Probabilities of the multinomial equation

If type = "prob_mn" and group3 = j then the function returns a vector of the estimates of the probabilities P(\tilde{z}_{i}=j|\tilde{w}_{i}).

Linear indexes (predictors) of the multinomial equation

If type = "li_mn" or type = "lp_mn" then the function returns a numeric matrix which j-th column is a numeric vector of estimates of the linear predictor (index) associated with the (j-1)-th alternative \tilde{w}_{i}\tilde{\gamma}_{(j-1)}.

Estimation of the marginal effects

If me is provided then the function returns marginal effect of variable me respect to the statistic determined by the type argument.

For example, if me = "x1" and type = "prob" then the function returns a marginal effect of x1 on the corresponding probability i.e., one that would be estimated if me is NULL.

If length(eps) = 1 then eps is an increment in numeric differentiation procedure. If eps is NULL then this increment will be selected automatically taking into account scaling of variables. If length(eps) = 2 then marginal effects will be estimated as the difference of predicted value when variable me equals eps[2] and eps[1] correspondingly.

For example, suppose that type = "prob", me = "x1", given_ind = 3 and groups = c(2, -1, 0). Then if eps is a NULL or a small number (something like eps = 0.0001) then the following marginal effect will be estimated (via the numeric differentiation):

\frac{\partial P(z_{1i}=2 | z_{3i} = 0)}{\partial x_{1i}}.

If eps = c(1, 3) then the function estimates the following difference (useful for estimation of marginal effects of ordered covariates):

P(z_{1i}=2 | z_{3i} = 0, x_{1i} = 3) - P(z_{1i}=2 | z_{3i} = 0, x_{1i} = 1).

Notice that the conditioning on w_{ji} has been omitted for brevity.

Causal inference

Argument exogenous is useful for the causal inference. For example, suppose that there are two binary outcomes z_{1i} and z_{2i}. Also z_{1i} is the endogenous regressor for z_{2i}. That is z_{1i} appears both on the left hand side of formula[[1]] and on the right hand side of formula[[2]]. Consider the estimation of the average treatment effect:

ATE = P(z_{2i} = 1|do(z_{1i}) = 1) - P(z_{2i} = 1|do(z_{1i}) = 0),

where do is a do-calculus operator. The estimate of the average treatment effect is as follows:

\widehat{ATE} = \frac{1}{n}\sum\limits_{i=1}^{n}p_{1i}-p_{0i},

where:

p_{1i} = \hat{P}(z_{2i} = 1|do(z_{1i}) = 1, w_{1i}, w_{2i}^{(*)}),

p_{0i} = \hat{P}(z_{2i} = 1|do(z_{1i}) = 0, w_{1i}, w_{2i}^{(*)}).

Vector w_{2i}^{(*)} denotes all the regressors w_{2i} except the endogenous one z_{1i}.

To get \widehat{ATE} it is sufficient to make the following steps. First, calculate p_{1i} by setting type = "prob", group = c(-1, 1) and providing the value 1 to z_{1i} through the exogenous argument. Second, calculate p_{0i} by setting type = "prob", group = c(-1, 0) and providing the value 0 to z_{1i} through the exogenous argument. Third, take the average value of p_{1i}-p_{0i}.

Value

This function returns predictions for each row of newdata or for each observation in the model if newdata is NULL. Structure of the output depends on the type argument (see 'Details' section).

switchSelection documentation built on Sept. 26, 2024, 5:07 p.m.