| npconmode | R Documentation |
npconmode performs kernel modal regression on mixed data,
and finds the
conditional mode given a set of training data, consisting of
explanatory data and dependent data, and possibly evaluation data.
Automatically computes various in sample and out of sample measures of
accuracy.
npconmode(bws,
...)
## S3 method for class 'formula'
npconmode(bws,
data = NULL,
newdata = NULL,
...)
## Default S3 method:
npconmode(bws,
txdat,
tydat,
nomad = FALSE,
proper = NULL,
proper.control = list(),
probabilities = FALSE,
gradients = FALSE,
level = NULL,
...)
## S3 method for class 'conbandwidth'
npconmode(bws,
txdat = stop("invoked without training data 'txdat'"),
tydat = stop("invoked without training data 'tydat'"),
exdat,
eydat,
proper = NULL,
proper.control = list(),
probabilities = FALSE,
gradients = FALSE,
level = NULL,
...)
## S3 method for class 'conmode'
predict(object,
newdata = NULL,
type = c("class", "prob"),
se.fit = FALSE,
...)
## S3 method for class 'conmode'
plot(x, ...)
These arguments identify the bandwidth specification, formula/data interface, and training data.
bws |
a bandwidth specification. This can be set as a |
data |
an optional data frame, list or environment (or object
coercible to a data frame by |
txdat |
a |
tydat |
a one (1) dimensional vector of unordered or ordered factors, containing the dependent data. Defaults to the training data used to compute the bandwidth object. |
object |
an object of class |
x |
an object of class |
These arguments control where the conditional mode is evaluated.
exdat |
a |
eydat |
a one (1) dimensional numeric or integer vector of the true values
(outcomes) of the dependent variable. By default,
evaluation takes place on the data provided by |
newdata |
An optional data frame in which to look for evaluation data. If omitted, the training data are used. |
type |
prediction type for |
se.fit |
a logical value for |
These arguments control stored class probabilities, proper probability normalization, and level-specific class-probability effects.
probabilities |
a logical value. If |
gradients |
a logical value. If |
level |
response level for class-probability gradients/effects and
|
proper |
a logical value or |
proper.control |
a list of controls for discrete probability properization. Currently
|
This argument controls the recommended automatic local-polynomial NOMAD route, which jointly selects continuous polynomial degree and bandwidths when conditional-density bandwidths are computed inside npconmode.
nomad |
logical shortcut passed through to |
Further arguments are passed to the bandwidth-selection counterpart, prediction/evaluation route, or plot route as appropriate.
... |
additional arguments supplied to |
Documentation guide: see np.kernels for kernels,
np.options for global options, and
plot for plotting options.
If bws is not an explicit conbandwidth object,
npconmode computes conditional density bandwidths by forwarding the
call to npcdensbw. The resulting conbandwidth object is
stored unchanged in the returned object's bws component. Regression
type, local-polynomial degree, and NOMAD/search metadata are also mirrored on
the returned conmode object for convenient inspection and summary
reporting; the bandwidth object remains the canonical source.
For non-local-constant conditional-mode fits, npconmode constructs
the full set of fitted probabilities over the discrete response support and
projects them onto the probability simplex before selecting and reporting
the modal outcome. Thus the probabilities used for modal selection are
non-negative and sum to one over the discrete support. For binary outcomes
this is the complement contract
\Pr(Y=\ell_2\mid X=x)=1-\Pr(Y=\ell_1\mid X=x). Local-constant fits
are already proper by construction, so omitted proper resolves to
FALSE for regtype="lc" and to TRUE otherwise.
The predict method follows the usual S3
newdata convention. With no evaluation arguments,
predict(fit) extracts the stored modal class and
predict(fit, type="prob") extracts stored class probabilities
when the object was fitted with probabilities=TRUE. With
newdata, predict evaluates the conditional mode at the
supplied rows using the stored bandwidth object. Use
type="prob" to return the full matrix of fitted class
probabilities at the evaluation rows. Use se.fit=TRUE with
type="prob" to return a list containing the class-probability matrix
and the matching asymptotic standard-error matrix. Native evaluator
arguments exdat and eydat remain available for advanced
workflows and take precedence if supplied.
npconmode returns a conmode object with the following
components:
bws |
the |
conmode |
a vector of type |
condens |
a vector of numeric type containing the modal density estimates at each evaluation point |
conderr |
a vector of numeric type containing asymptotic
standard errors for the modal density estimates at each evaluation point.
If a row is materially properized by probability projection, this value is
set to |
xeval |
a data frame of evaluation points |
yeval |
a vector of type |
confusion.matrix |
the confusion matrix or |
CCR.overall |
the overall correct
classification ratio, or |
CCR.byoutcome |
a numeric vector containing the correct
classification ratio by outcome, or |
fit.mcfadden |
the McFadden-Puig-Kerschner performance measure
or |
probabilities |
if requested, a matrix of fitted probabilities over the discrete response support |
probability.levels |
if |
probability.errors |
if |
probability.repaired.rows |
if |
probability.gradients |
if |
probability.gradient.level, probability.gradient.names, probability.gradient.info |
metadata describing the response support,
conditioning variables, and interpretation of
|
proper.requested, proper.applied, proper.info |
metadata describing whether discrete probability properization was requested, whether any row was materially projected, and diagnostics for non-negativity and unit-mass checks |
regtype, degree, nomad, search.engine |
metadata mirrored from
|
degree.search, nomad.shortcut, nomad.time, powell.time |
detailed
search metadata mirrored from |
The function predict may be used to extract conditional mode
class predictions, while fitted extracts the conditional
density estimates at the conditional mode from the resulting object. The
function gradients extracts
class-probability gradients/effects when gradients=TRUE. Also,
summary and plot support conmode
objects. For plot.conmode, first fit with
probabilities=TRUE so the plot can remain object-fed. The default
plot displays the fitted probability for the base/reference response level
levels(y)[1]; use plot(fit, level=...) to select another
outcome and plot(fit, gradients=TRUE) to display stored
class-probability effects. The default view="sample" draws stored
object-fed probability/effect payloads at the fitted evaluation points. Use
view="fixed" and neval to draw object-fed one-dimensional
slices over each conditioning variable, with other variables held at their
median/mode values. Use perspective=TRUE to draw a base-graphics
probability surface for one selected response level over two continuous
conditioning variables, and renderer="rgl" for the corresponding
interactive surface. Use errors="asymptotic" for probability-level
standard errors and intervals in one-dimensional plots, or with
output="data" for surface interval payloads. Bootstrap intervals and
surface band rendering are not yet implemented for plot.conmode.
The conditional-mode target is
\arg\max_y \Pr(Y=y\mid X=x) for a discrete response. In practice
npconmode estimates the conditional probability of each
response support point using npcdensbw /
npcdens, optionally projects non-local-constant fitted
probabilities onto the probability simplex, and then reports the
modal support point.
Setting gradients=TRUE stores class-probability effects for one
response level. If level is omitted, the base/reference response level
levels(y)[1] is used. These effects are useful for asking how the
fitted probability of a selected class changes with each covariate; they are
not gradients of the arg max classification rule itself.
For book-length background, see Racine (2019), Chapter 4 Conditional Probability Density and Cumulative Distribution Functions, especially the binary and multinomial choice material, and Li and Racine (2007), Chapter 5 Conditional Density Estimation together with Chapter 4 Kernel Estimation with Mixed Data.
If you are using data of mixed types, then it is advisable to use the
data.frame function to construct your input data and not
cbind, since cbind will typically not work as
intended on mixed data types and will coerce the data to the same
type.
Tristen Hayfield tristen.hayfield@gmail.com, Jeffrey S. Racine racinej@mcmaster.ca
Aitchison, J. and C.G.G. Aitken (1976), “Multivariate binary discrimination by the kernel method,” Biometrika, 63, 413-420.
Hall, P. and J.S. Racine and Q. Li (2004), “Cross-validation and the estimation of conditional probability densities,” Journal of the American Statistical Association, 99, 1015-1026.
Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.
McFadden, D. and C. Puig and D. Kerschner (1977), “Determinants of the long-run demand for electricity,” Proceedings of the American Statistical Association (Business and Economics Section), 109-117.
Pagan, A. and A. Ullah (1999), Nonparametric Econometrics, Cambridge University Press.
Scott, D.W. (1992), Multivariate Density Estimation. Theory, Practice and Visualization, New York: Wiley.
Silverman, B.W. (1986), Density Estimation, London: Chapman and Hall.
Wang, M.C. and J. van Ryzin (1981), “A class of smooth estimators for discrete distributions,” Biometrika, 68, 301-309.
np.kernels, np.options,
plot, npcdensbw.
## Not run:
# EXAMPLE 1 (INTERFACE=FORMULA): For this example, we use the
# birthweight data taken from the MASS library, and compute a parametric
# logit model and a nonparametric conditional mode model. We then
# compare their confusion matrices and summary measures of
# classification ability.
library("MASS")
data("birthwt")
birthwt$low <- factor(birthwt$low)
birthwt$smoke <- factor(birthwt$smoke)
birthwt$race <- factor(birthwt$race)
birthwt$ht <- factor(birthwt$ht)
birthwt$ui <- factor(birthwt$ui)
birthwt$ftv <- ordered(birthwt$ftv)
with(birthwt, {
# Fit a parametric logit model with low (0/1) as the dependent
# variable and age, lwt, and smoke (0/1) as the covariates
# From ?birthwt
# 'low' indicator of birth weight less than 2.5kg
# 'smoke' smoking status during pregnancy
# 'race' mother's race ('1' = white, '2' = black, '3' = other)
# 'ht' history of hypertension
# 'ui' presence of uterine irritability
# 'ftv' number of physician visits during the first trimester
# 'age' mother's age in years
# 'lwt' mother's weight in pounds at last menstrual period
model.logit <- glm(low~smoke+
race+
ht+
ui+
ftv+
age+
lwt,
family=binomial(link=logit))
# Generate the confusion matrix and correct classification ratio
cm <- table(low, ifelse(fitted(model.logit)>0.5, 1, 0))
ccr <- sum(diag(cm))/sum(cm)
# Now do the same with a nonparametric model. Note - this may take a
# few minutes depending on the speed of your computer...
bw <- npcdensbw(formula=low~smoke+
race+
ht+
ui+
ftv+
age+
lwt,
data=birthwt)
model.np <- npconmode(bws=bw)
# Compare confusion matrices from the logit and nonparametric model
# Logit
cm
ccr
# Nonparametric
summary(model.np)
# Predict modal classes and fitted class probabilities at selected rows
new.birthwt <- birthwt[1:5, c("smoke", "race", "ht", "ui", "ftv", "age", "lwt")]
predict(model.np, newdata=new.birthwt)
predict(npconmode(bws=bw, probabilities=TRUE),
newdata=new.birthwt, type="prob")
})
# EXAMPLE 1 (INTERFACE=DATA FRAME): For this example, we use the
# birthweight data taken from the MASS library, and compute a parametric
# logit model and a nonparametric conditional mode model. We then
# compare their confusion matrices and summary measures of
# classification ability.
library("MASS")
data("birthwt")
with(birthwt, {
# Fit a parametric logit model with low (0/1) as the dependent
# variable and age, lwt, and smoke (0/1) as the covariates
# From ?birthwt
# 'low' indicator of birth weight less than 2.5kg
# 'smoke' smoking status during pregnancy
# 'race' mother's race ('1' = white, '2' = black, '3' = other)
# 'ht' history of hypertension
# 'ui' presence of uterine irritability
# 'ftv' number of physician visits during the first trimester
# 'age' mother's age in years
# 'lwt' mother's weight in pounds at last menstrual period
model.logit <- glm(low~factor(smoke)+
factor(race)+
factor(ht)+
factor(ui)+
ordered(ftv)+
age+
lwt,
family=binomial(link=logit))
# Generate the confusion matrix and correct classification ratio
cm <- table(low, ifelse(fitted(model.logit)>0.5, 1, 0))
ccr <- sum(diag(cm))/sum(cm)
# Now do the same with a nonparametric model...
X <- data.frame(factor(smoke),
factor(race),
factor(ht),
factor(ui),
ordered(ftv),
age,
lwt)
y <- factor(low)
# Note - this may take a few minutes depending on the speed of your
# computer...
bw <- npcdensbw(xdat=X, ydat=y)
model.np <- npconmode(bws=bw)
# Compare confusion matrices from the logit and nonparametric model
# Logit
cm
ccr
# Nonparametric
summary(model.np)
})
# EXAMPLE 3 (CLASS PROBABILITY EFFECTS): compute and plot
# class-probability gradients/effects for one selected response level.
# This example uses a small artificial sample so that it runs quickly.
set.seed(42)
n <- 100
x <- seq(-1, 1, length.out=n)
y <- factor(rbinom(n, 1, plogis(1.5*x)), levels=0:1)
model.effects <- npconmode(y~x,
regtype="ll",
bwmethod="cv.ls",
nmulti=1,
probabilities=TRUE,
gradients=TRUE)
gradients(model.effects)
plot(model.effects)
plot(model.effects, view="fixed", neval=25)
plot(model.effects, gradients=TRUE)
plot(model.effects, gradients=TRUE, view="fixed", neval=25)
## A two-continuous-predictor fit can be displayed as a probability surface.
## The default level is the base/reference response level.
z <- runif(n, -1, 1)
y2 <- factor(rbinom(n, 1, plogis(1.5*x - z)), levels=0:1)
model.surface <- npconmode(y2~x+z,
regtype="ll",
bwmethod="cv.ls",
nmulti=1,
probabilities=TRUE)
plot(model.surface, perspective=TRUE, neval=15)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.