cparmlogit: Conditionally parametric logit for two or more choices

Description Usage Arguments Details Value References See Also Examples

View source: R/cparmlogit.R

Description

Estimates a multinomial logit model with two or more choices by maximizing a locally weighted likelihood function – the logit equivalent of cparlwr

Usage

1
2
3
 
cparmlogit(form,nonpar,window=.25,bandwidth=0,kern="tcub",
distance="Mahal",target=NULL,data=NULL)  

Arguments

form

Model formula

nonpar

List of either one or two variables for z. Formats: cparmlogit(y~xlist, nonpar=~z1, ...) or cparmlogit(y~xlist, nonpar=~z1+z2, ...). Important: note the "~" before the first z variable.

window

Window size. Default: 0.25.

bandwidth

Bandwidth. Default: not used.

kern

Kernel weighting functions. Default is the tri-cube. Options include "rect", "tria", "epan", "bisq", "tcub", "trwt", and "gauss".

distance

Options: "Euclid", "Mahal", or "Latlong" for Euclidean, Mahalanobis, or "great-circle" geographic distance. May be abbreviated to the first letter but must be capitalized. Note: cparmlogit looks for the first two letters to determine which variable is latitude and which is longitude, so the data set must be attached first or specified using the data option; options like data$latitude will not work. Default: Mahal.

target

If target = NULL, uses the maketarget command to form targets using the values specified for window, bandwidth, and kern. If target="alldata", each observation is used as a target value for x. A set of target values can be supplied directly.

data

A data frame containing the data. Default: use data in the current working directory

Details

The list of explanatory variables is specified in the base model formula while Z is specified using nonpar. X can include any number of explanatory variables, but Z must have at most two.

The model is estimated by maximizing the following weighted log-likelihood function at each target point:

∑_i ∑_j w_i I(y_i=j) log(P(X_i β_j))

where y is the discrete dependent variable with K+1 choices, X is the set of explanatory variables, and P(X_i β_j) = exp(X_i β_j) / ∑_j exp(X_i β_j). For the base value, y=0, the coefficients are normalized to β_0 = 0.

When Z includes a single variable, w_i is a simple kernel weighting function: w_i = K((z_i - z_0 )/(sd(z)*h)) . When Z includes two variables (e.g., nonpar=~z1+z2), the method for specifying w depends on the distance option. Under either option, the ith row of the matrix Z = (z1, z2) is transformed such that z_i = sqrt(z_i * V * t(z_i)). Under the "Mahal" option, V is the inverse of cov(Z). Under the "Euclid" option, V is the inverse of diag(cov(Z)). After this transformation, the weights again reduce to the simple kernel weighting function K((z_i - z_0 )/(sd(z)*h)). h is specified by the bandwidth or window option.

The great circle formula is used to constuct the distances used to form the weights when distance = "Latlong"; in this case, the variable list for nonpar must be listed as nonpar = ~latitude+longitude (or ~lo+la or ~lat+long, etc), with the longitude and latitude variables expressed in degrees (e.g., -87.627800 and 41.881998 for one observation of longitude and latitude, respectively). The order in which latitude and longitude are listed does not matter and the function only looks for the first two letters to determine which variable is latitude and which is the longitude. It is important to note that the great circle distance measure is left in miles rather than being standardized. Thus, the window option should be specified when distance = "Latlong" or the bandwidth should be adjusted to account for the scale. The kernel weighting function becomes K(distance/h) under the "Latlong" option.

Following White (1982), the covariance matrix for a quasi-maximum likelihood model is A^{-1}BA^{-1} , where

A = ∑ w_i d^2LnL_i/dβ dβ'

B = ∑ w_i^2 (dLnL_i/dβ)(dLnL_i/dβ')

The covariance matrix is calculated at each target point and the implied standard errors are then interpolated to each data point. Estimation can be very slow when target = "alldata". The maketarget command can be used to identify target points.

Available kernel weighting functions include the following:

Kernel Call abbreviation Kernel function K(z)
Rectangular ``rect'' 1/2 * I(|z|<1)
Triangular ``tria'' (1-|z|) * I(|z|<1)
Epanechnikov ``epan'' 3/4 * (1-z^2)*I(|z| < 1)
Bi-Square ``bisq'' 15/16 * (1-z^2)^2 * I(|z| < 1)
Tri-Cube ``tcub'' 70/81 * (1-|z|^3)^3 * I(|z| < 1)
Tri-Weight ``trwt'' 35/32 * (1-z^2)^3 * I(|z| < 1)
Gaussian ``gauss'' 2pi^{-.5} exp(-z^2/2)

Value

target

The target points for the original estimation of the function.

xcoef.target

Estimated coefficients, B(z), at the target values of z.

xcoef.target.se

Standard errors for B(z) at the target values of z.

xcoef

Estimated coefficients, B(z), at the original data points.

xcoef.se

Standard errors for B(z) with z evaluated at all points in the data set.

pmat

The n x K+1 matrix of estimated probabilities.

lnl

The log-likelihood value.

References

Fan, Jianqing, Nancy E. Heckman, and M.P. Wand, "Local Polynomial Kernel Regression for Generalized Linear Models and Quasi-Likelihood Functions," Journal of the American Statistical Association 90 (1995), 141-150.

Loader, Clive. Local Regression and Likelihood. New York: Springer, 1999.

McMillen, Daniel P. and John F. McDonald, "Locally Weighted Maximum Likelihood Estimation: Monte Carlo Evidence and an Application," in Luc Anselin, Raymond J.G.M. Florax, and Sergio J. Rey, eds., Advances in Spatial Econometrics, Springer-Verlag, New York (2004), 225-239.

Tibshirani, Robert and Trevor Hastie, "Local Likelihood Estimation," Journal of the American Statistical Association 82 (1987), 559-568.

See Also

cparlogit

cparprobit

gmmlogit

gmmprobit

splogit

spprobit

spprobitml

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
library(mlogit)
set.seed(5647)
n = 1000
x <- runif(n,0,pi*sqrt(12))
o <- order(x)
x <- x[o]
form <- yvar~x
nonpar <- ~x

# 2 choices
ybase <- x + rlogis(n)
yvar <- ybase>.5*pi*sqrt(12)
table(yvar)
fit <- glm(yvar~x,family=binomial(link="logit"))
summary(fit)
p <- fitted(fit)
fit1 <- cparmlogit(yvar~x,nonpar=~x,window=.5,kern="tcub")
fit1$lnl
colMeans(fit1$xcoef)
colMeans(fit1$xcoef.se)
cor(p,fit1$pmat)
plot(x,p,xlab="x",ylab="Prob(y=1)",type="l")
lines(x,fit1$pmat[,2],col="red")
legend("topleft",c("Standard Logit","CPAR"),col=c("black","red"),lwd=1)

## Not run: 
par(ask=TRUE)
# 3 choices
ybase1 <- -.5*pi*sqrt(12) + x + rlogis(n)
ybase2 <-  -.5*pi*sqrt(12)/2 + x/2 + rlogis(n)
yvar <- ifelse(ybase1>ybase2,1,2)
yvar <- ifelse(ybase1<0&ybase2<0,0,yvar)
table(yvar)
mdata <- data.frame(yvar,x)
fit <- mlogit(yvar~0 | x, data=mdata, shape="wide")
summary(fit)
fit1 <- cparmlogit(yvar~x,nonpar=~x,window=.5,kern="tcub")
fit1$lnl
colMeans(fit1$xcoef)
colMeans(fit1$xcoef.se)
cor(fit$probabilities,fit1$pmat)
plot(x,fit$probabilities[,1],xlab="x",ylab="Prob(y=1)",type="l",main="Prob(y=0)")
lines(x,fit1$pmat[,1],col="red")
legend("topright",c("Standard Logit","CPAR"),col=c("black","red"),lwd=1)
plot(x,fit$probabilities[,2],xlab="x",ylab="Prob(y=1)",type="l",main="Prob(y=1)")
lines(x,fit1$pmat[,2],col="red")
legend("topleft",c("Standard Logit","CPAR"),col=c("black","red"),lwd=1)
plot(x,fit$probabilities[,3],xlab="x",ylab="Prob(y=1)",type="l",main="Prob(y=2)")
lines(x,fit1$pmat[,3],col="red")
legend("topleft",c("Standard Logit","CPAR"),col=c("black","red"),lwd=1)

# 2 choices, quadratic
x2 <- x^2
ybase <- x - .1*(x^2) + rlogis(n)
yvar <- ybase>median(ybase)
table(yvar)
fit <- glm(yvar~x+x2,family=binomial(link="logit"))
summary(fit)
p <- fitted(fit)
fit1 <- cparmlogit(yvar~x,nonpar=~x,window=.25,kern="tcub")
fit1$lnl
colMeans(fit1$xcoef)
colMeans(fit1$xcoef.se)
cor(p,fit1$pmat)
plot(x,p,xlab="x",ylab="Prob(y=1)",type="l")
lines(x,fit1$pmat[,2],col="red")
legend("topleft",c("Standard Logit","CPAR"),col=c("black","red"),lwd=1)

## End(Not run)

McSpatial documentation built on May 2, 2019, 9:32 a.m.