oglmx: Fit Ordered Generalized Linear Model.

Description Usage Arguments Value Examples

View source: R/oglmx_main.R

Description

oglmx is used to estimate models for which the outcome variable is discrete and the mean and/or variance of the underlying latent variable can be modelled as a linear combination of explanatory variables. Standard models such as probit, logit, ordered probit and ordered logit are included in the diverse set of models estimated by the function.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
oglmx(
  formulaMEAN,
  formulaSD = NULL,
  selection = NULL,
  data,
  start = NULL,
  weights = NULL,
  link = "probit",
  constantMEAN = TRUE,
  constantSD = TRUE,
  beta = NULL,
  delta = NULL,
  threshparam = NULL,
  analhessian = TRUE,
  sdmodel = expression(exp(z)),
  SameModelMEANSD = FALSE,
  na.action,
  savemodelframe = TRUE,
  Force = FALSE,
  robust = FALSE,
  optmeth = c("NR", "BFGS", "BFGSR", "BHHH", "SANN", "CG", "NM"),
  gradient = c("analytical", "numerical"),
  tol = 1e-20,
  start_method = c("default", "search"),
  search_iter = 10,
  return_envir = FALSE
)

Arguments

formulaMEAN

an object of class formula: a symbolic description of the model used to explain the mean of the latent variable. The response variable should be a numeric vector or factor variable such that the numerical assignments for the levels of the factor have ordinal meaning.

formulaSD

either NULL (homoskedastic model) or an object of class formula: a symbolic description of the model used to explain the variance of the latent variable.

selection

Formula for Heckman selection model. If NULL (default), assuming no selection on observables (introduced in #11)

data

a data frame containing the variables in the model.

start

either NULL or a numeric vector specifying start values for each of the estimated parameters, passed to the maximisation routine.

weights

an optional vector of ‘prior weights’ to be used in the fitting process. Should be NULL or a numeric vector.

link

specifies a link function for the model to be estimated, accepted values are "probit", "logit", "cauchit", "loglog" and "cloglog"

constantMEAN

logical. Should an intercept be included in the model of the mean of the latent variable? Can be overwritten and set to FALSE using the formulaMEAN argument by writing 0 + as the first element of the equation.

constantSD

logical. Should an intercept be included in the model of the variance of the latent variable? Can be overwritten and set to FALSE using the formulaSD argument by writing 0 + as the first element of the equation.

beta

NULL or numeric vector. Used to prespecify elements of the parameter vector for the equation of the mean of the latent variable. Vector should be of length one or of length equal to the number of explanatory variables in the mean equation. If of length one the value is presumed to correspond to the constant if a constant is included or the first element of the parameter vector. If of length greater than one then NA should be entered for elements of the vector to be estimated.

delta

NULL or numeric vector. Used to prespecify elements of the parameter vector for the equation of the variance of the latent variable. Vector should be of length one or of length equal to the number of explanatory variables in the variance equation. If of length one the value is presumed to correspond to the constant if a constant is included or the first element of the parameter vector. If of length greater than one then NA should be entered for elements of the vector to be estimated.

threshparam

NULL or numeric vector. Used to prespecify the threshold parameters of the model. Vector should be of length equal to the number of outcomes minus one. NA should be entered for threshold parameters to be estimated by the model.

analhessian

logical. Indicates whether the analytic Hessian should be calculated and used, default is TRUE, if set to FALSE a finite-difference approximation of the Hessian is used.

sdmodel

object of mode “expression”. The expression defines the function that transforms the linear model for the standard deviation into the standard deviation. The expression should be written as a function of variable z. The default value is expression(exp(z)).

SameModelMEANSD

logical. Indicates whether the matrix used to model the mean of the latent variable is identical to that used to model the variance. If formulaSD=NULL and SameModelMEANSD=TRUE a model with heteroskedasticity is estimated. If SameModelMEANSD=FALSE and formulaSD==formulaMEAN value is overridden. Used to reduce memory requirements when models are identical.

na.action

a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The ‘factory-fresh’ default is na.omit. Another possible value is NULL, no action. Value na.exclude can be useful.

savemodelframe

logical. Indicates whether the model frame(s) should be saved for future use. Default is FALSE. Should be set to TRUE if intending to estimate Average Marginal Effects.

Force

logical. If set to FALSE (the default), the function stops if the response variable has more than twenty categories. Should be changed to TRUE if a model with more than twenty categories is desired.

robust

logical. If set to TRUE the outer product or BHHH estimate of the meat in the sandwich of the variance-covariance matrix is calculated. If calculated standard errors will be calculated using the sandwich estimator by default when calling summary.

optmeth

specifies a method for the maximisation of the likelihood passed to maxLik::maxLik(). Default to NR (Newton-Raphson) when no selection is introduced. Forced to BHHH when selection on observables is introduced.

gradient

Should we use analytical gradient (default) or numerical gradient ? Analytical gradient results in slower iterations but sometimes help the model to converge faster.

tol

Argument passed to qr.solve, defines the tolerance for detecting linear dependencies in the hessian matrix to be inverted.

start_method

Should we use default intiial value or search for a better one ?

search_iter

Number of values to look for when using start_method = 'search'

return_envir

Logical indicating whether we want to stop early and return objects used to fit the model

Value

An object of class "oglmx" with the following components:

loglikelihood

log-likelihood for the estimated model. Includes as attributes the log-likelihood for the constant only model and the number of observations.

link

link function used in the estimated model.

no.iterations

number of iterations of maximisation algorithm.

coefficients

named vector of estimated parameters.

returnCode

code returned by the maxLik optimisation routine

call

the call used to generate the results.

gradient

numeric vector, the value of the gradient of the log-likelihood function at the obtained parameter vector. Should be approximately equal to zero.

terms

two element list. Each element is an object of type terms related to the mean and standard deviation equation respectively.

formula

two element list. Each element is an object of type stats::formula() related to the mean and standard deviation equation respectively.

NoVarModData

dataframe. Contains data required to estimate the no information model used in calculation of McFadden's R-squared measure.

hessian

hessian matrix of the log-likelihood function evaluated at the obtained parameter vector.

BHHHhessian

Either NULL if no weights were included and robust = FALSE, or the BHHH estimate.

Hetero

logical. If TRUE indicates that the estimated model includes a model for the variance of the error term, i.e. heteroskedasticity.

NOutcomes

the number of distinct outcomes in the response variable.

Outcomes

numeric vector of length equal to NOutcomes. Lists the values of the different outcomes.

BothEq

data.frame with either two or three columns. Lists the names of variables that are in both the mean and variance equations and their locations within their respective model frames. Information is required in the call of margins.oglmx to obtain correct marginal effects.

allparams

a list containing three numeric vectors, the vectors contain the parameters from the mean equation, the variance equation and the threshold parameters respectively. Includes the prespecified and estimated parameters together.

varMeans

a list containing two numeric vectors. The vectors list the mean values of the variables in the mean and variance equation respectively. Stored for use in a call of margins.oglmx to obtain marginal effects at means.

varBinary

a list containing two numeric vectors. The vectors indicate whether the variables in the mean and variance equations are binary indicators. Stored for use in a call of margins.oglmx to obtain marginal effects at means.

Est.Parameters

list containing three logical vectors. Indicates which parameters in the parameter vectors were estimated.

modelframes

If savemodelframe set to FALSE then returns NULL, otherwise returns a list with two elements, the model frames for the mean and variance equations.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
## Not run: 
# create random sample, three variables, two binary.
set.seed(242)
n<-250
x1<-sample(c(0,1),n,replace=TRUE,prob=c(0.75,0.25))
x2<-vector("numeric",n)
x2[x1==0]<-sample(c(0,1),n-sum(x1==1),replace=TRUE,prob=c(2/3,1/3))
z<-rnorm(n,0.5)
# create latent outcome variable
latenty<-0.5+1.5*x1-0.5*x2+0.5*z+rnorm(n,sd=exp(0.5*x1-0.5*x2))
# observed y has four possible values: -1,0,1,2
# threshold values are: -0.5, 0.5, 1.5.
y<-vector("numeric",n)
y[latenty< -0.5]<- -1
y[latenty>= -0.5 & latenty<0.5]<- 0
y[latenty>= 0.5 & latenty<1.5]<- 1
y[latenty>= 1.5]<- 2
dataset<-data.frame(y,x1,x2)
# estimate standard ordered probit
results.oprob<-oglmx(y ~ x1 + x2 + z, data=dataset,link="probit",constantMEAN=FALSE,
                     constantSD=FALSE,delta=0,threshparam=NULL)
coef(results.oprob) # extract estimated coefficients
summary(results.oprob)
# calculate marginal effects at means
margins.oglmx(results.oprob)
# estimate ordered probit with heteroskedasticity
results.oprobhet<-oglmx(y ~ x1 + x2 + z, ~ x1 + x2, data=dataset, link="probit",
                        constantMEAN=FALSE, constantSD=FALSE,threshparam=NULL)
summary(results.oprobhet)
library("lmtest")
# likelihood ratio test to compare model with and without heteroskedasticity.
lrtest(results.oprob,results.oprobhet)
# calculate marginal effects at means.
margins.oglmx(results.oprobhet)
# scale of parameter values is meaningless. Suppose instead two of the
# three threshold values were known, then can include constants in the
# mean and standard deviation equation and the scale is meaningful.
results.oprobhet1<-oglmx(y ~ x1 + x2 + z, ~ x1 + x2, data=dataset, link="probit",
                         constantMEAN=TRUE, constantSD=TRUE,threshparam=c(-0.5,0.5,NA))
summary(results.oprobhet1)
margins.oglmx(results.oprobhet1)
# marginal effects are identical to results.oprobithet, but using the true thresholds
# means the estimated parameters are on the same scale as underlying data.
# can choose any two of the threshold values and get broadly the same result.
results.oprobhet2<-oglmx(y ~ x1 + x2 + z, ~ x1 + x2, data=dataset, link="probit",
                         constantMEAN=TRUE, constantSD=TRUE,threshparam=c(-0.5,NA,1.5))
summary(results.oprobhet2)
margins.oglmx(results.oprobhet2)
# marginal effects are again identical. Parameter estimates do change.

## End(Not run)

linogaliana/oglm documentation built on March 5, 2021, 8:27 p.m.