np.copula: Kernel Copula Estimation with Mixed Data Types

npcopulaR Documentation

Kernel Copula Estimation with Mixed Data Types

Description

npcopula estimates a mixed-data kernel copula distribution or copula density. It can be called with an existing unconditional distribution/density bandwidth object, or with a one-sided formula in which case the appropriate bandwidth object is selected internally.

Usage

npcopula(bws, ...)

## S3 method for class 'formula'
npcopula(bws,
         data = NULL,
         u = NULL,
         target = c("distribution", "density"),
         evaluation = c("grid", "sample"),
         neval = 30,
         n.quasi.inv = 1000,
         er.quasi.inv = 1,
         ...)

## Default S3 method:
npcopula(bws,
         data,
         u = NULL,
         target = NULL,
         evaluation = c("sample", "grid"),
         neval = 30,
         n.quasi.inv = 1000,
         er.quasi.inv = 1,
         ...)

## S3 method for class 'npcopula'
predict(object,
        newdata = NULL,
        u = NULL,
        se.fit = FALSE,
        output = c("vector", "object", "data"),
        ...)

## S3 method for class 'npcopula'
plot(x,
     perspective = TRUE,
     view = c("rotate", "fixed", "surface", "contour", "image",
              "empirical", "all"),
     renderer = c("base", "rgl"),
     errors = c("none", "bootstrap", "asymptotic"),
     band = c("pointwise", "pmzsd", "bonferroni",
              "simultaneous", "all"),
     alpha = 0.05,
     bootstrap = c("inid", "fixed", "geom"),
     B = 1999,
     center = c("estimate", "bias-corrected"),
     boot_control = np_boot_control(),
     output = c("plot", "data", "plot-data", "both"),
     legend = TRUE,
     theta = 0.0,
     phi = 20.0,
     xlab = "u1",
     ylab = "u2",
     zlab = NULL,
     main = NULL,
     col = NULL,
     border = "black",
     zlim = NULL,
     ...)

Arguments

Data, Bandwidth Inputs And Formula Interface

These arguments identify the bandwidth specification, source data, and formula route.

bws

bandwidth specification or one-sided formula. A bandwidth specification is either an unconditional distribution bandwidth object returned by npudistbw, for a copula distribution, or an unconditional density bandwidth object returned by npudensbw, for a copula density. If bws is a formula such as ~ x + y, npcopula first calls npudistbw when target="distribution" and npudensbw when target="density"; additional bandwidth-selection arguments in ... are forwarded to that bandwidth selector.

data

data frame containing the variables used to construct bws; when bws is a formula, data is passed to the bandwidth selector. Copulas are defined here for numeric and ordered variables; unordered factors are rejected.

Copula Target And Evaluation Grid

These arguments control whether a copula distribution or density is estimated and where it is evaluated.

er.quasi.inv

fraction passed to extendrange when constructing the marginal quasi-inverse grid used for supplied probability values u. See Details.

evaluation

evaluation route used when bws is a formula. The default "grid" constructs a plot-ready two-dimensional probability grid when u is omitted. Use "sample" to evaluate at the sample realizations.

n.quasi.inv

number of grid points used to compute each marginal quasi-inverse when u is supplied or automatically generated.

newdata

optional prediction data for predict.npcopula. This is a compatibility alias for u: it should contain marginal probability values in [0,1], either in columns named as the original variables or in columns named u1, u2, .... If both u and newdata are supplied, u takes precedence.

neval

number of probability values per margin in the automatically generated two-dimensional grid used when bws is a formula, evaluation="grid", and u is omitted.

target

target used when bws is a formula. The default "distribution" estimates a copula distribution using npudistbw; "density" estimates a copula density using npudensbw. When bws is already a bandwidth object, the target is inferred from that object and a conflicting explicit target is rejected.

u

optional matrix or data frame of marginal probability values in [0,1]. Each column corresponds to one variable in the copula. If supplied, npcopula evaluates on the Cartesian product expand.grid(u). For two-dimensional displays, the clearest spelling is often data.frame(u1 = u1.seq, u2 = u2.seq); names matching the original variables are also accepted for compatibility. If omitted with a formula route and evaluation="grid", a two-dimensional grid is generated automatically.

object

an object of class "npcopula" returned by npcopula.

se.fit

logical value. If TRUE, predict.npcopula returns a list with fitted values and the stored standard-error slot.

Plot Display Controls

These arguments control how a two-dimensional grid copula object is displayed.

x

an object of class "npcopula" returned by npcopula.

border

border color for surface facets and interval wireframes when view="surface".

col

optional surface or image colors. If omitted, plot.npcopula uses the same viridis hcl.colors palette used by the other modern surface plot methods.

legend

logical or legend-control value used for band="all" interval overlays. Use FALSE, NULL, or NA to suppress the interval legend.

main, xlab, ylab, zlab

plot titles and axis labels.

output

return mode. For plot.npcopula, "plot" draws the plot, "data" returns the plotted data with interval columns when requested, and "plot-data" draws and returns the plotted data. "both" is accepted as an alias for "plot-data". For predict.npcopula, "vector" returns fitted copula values, "object" returns the evaluated "npcopula" object, and "data" returns as.data.frame() on that object.

perspective

logical value. If TRUE, draw a surface display; otherwise use view to choose "contour" or "image". The "empirical" and "all" views ignore this argument.

phi, theta

viewing angles passed to persp or to the shared rgl surface renderer. The defaults match the package-wide surface-plot defaults used by the other perspective plot methods. As with those methods, the exact default pair theta = 0 and phi = 20 is remapped internally for renderer="rgl" to account for the different viewing-angle convention used by rgl; explicitly supplied non-default angles are passed through.

renderer

plotting renderer for surface displays. "base" uses persp. "rgl" uses the shared interactive rgl surface renderer when the suggested package rgl is installed.

view

display type for grid output. The default "rotate" draws a rotating base persp surface using the same frame step and delay as the other package perspective plots. Use "fixed" for a single fixed surface, "surface" as a backward-compatible fixed-surface alias, or "contour" and "image" with perspective=FALSE. Use "empirical" to plot the empirical copula coordinates. Use "all" for a base-graphics four-panel display containing the copula contour, copula surface, empirical copula coordinates, and a copula-density surface. The "all" view is not currently supported with renderer="rgl".

zlim

optional z-axis limits for surface displays.

Plot Interval Controls

These arguments add asymptotic or bootstrap intervals to two-dimensional surface plots.

alpha

nominal size used for asymptotic or bootstrap intervals when errors!="none".

B

number of bootstrap replications when errors="bootstrap".

band

interval type for plotted surfaces. Supported values are "pointwise", "pmzsd", "bonferroni", "simultaneous", and "all". The "all" option overlays pointwise, simultaneous, and Bonferroni wireframes where available.

bootstrap

bootstrap resampling method used when errors="bootstrap"; supported values are "inid", "fixed", and "geom". Wild bootstrap is not defined for copula surfaces.

boot_control

optional np_boot_control object. For copula surfaces the block length is used by the "fixed" and "geom" block bootstrap routes.

center

centering convention for bootstrap intervals, either "estimate" or "bias-corrected".

errors

interval route for plot.npcopula: "none", "asymptotic", or "bootstrap". Intervals are available for two-dimensional grid evaluation output and are drawn as transparent wireframes over the copula surface.

Additional Arguments

Further arguments are passed to the bandwidth-selection counterpart, prediction/evaluation route, or graphics renderer as appropriate.

...

additional arguments supplied to npudistbw or npudensbw when npcopula computes bandwidths internally, or arguments needed to interpret a numeric bws vector. This is where bandwidth-selection controls such as bwmethod, bwtype, and bwscaling, kernel/support controls such as ckertype, ckerorder, and ckerbound, categorical kernel controls such as ukertype and okertype, and search controls such as nmulti and scale.factor.search.lower are supplied. In predict.npcopula, additional arguments are passed to npcopula for evaluation with the stored bandwidth object and training data. In plot.npcopula, additional arguments are passed to the selected graphics routine, such as persp, contour, image, or the shared rgl renderer.

Details

Documentation guide: see np.kernels for kernels, np.options for global options, and plot for plotting options.

npcopula computes the nonparametric copula distribution or copula density using marginal quasi-inversion. For the distribution target, Sklar's theorem gives

C(u_1,\ldots,u_d) = H(F_1^{-1}(u_1),\ldots,F_d^{-1}(u_d)),

where H is the joint distribution and F_j^{-1} is the quasi-inverse of marginal distribution F_j. For the density target, the estimated copula density is

c(u) = \frac{f(x_u)} {\prod_{j=1}^d f_j(x_{u,j})}, \quad x_{u,j}=F_j^{-1}(u_j),

with numerator and marginal denominators estimated using the selected mixed-data kernel bandwidths.

If u is provided, expand.grid is called on u. As the dimension increases this can become unwieldy because a grid with m points in each of d margins has m^d rows. Therefore the formula route automatically generates a probability grid only for two-dimensional copulas. For higher-dimensional copulas, supply u explicitly or use evaluation="sample".

The ‘quasi-inverse’ is computed via Definition 2.3.6 from Nelsen (2006). An equi-quantile grid on the data range of length n.quasi.inv/2 is combined with an equi-spaced grid on the data range extended by er.quasi.inv; the sorted union forms the grid used for marginal inversion. If requested probability values lie outside the attainable estimated marginal distribution range, they are reset to the nearest attainable endpoint. Inspect the returned u columns when endpoint behavior matters.

The plot.npcopula method supports base persp, contour, and image displays for two-dimensional grid output. Surface plots use the package-wide viridis default palette, detailed perspective ticks, and the same default viewing angles and base-graphics rotation cadence as the other surface plot methods. renderer="rgl" requests the shared interactive rgl surface renderer, using the same default-angle remapping used by the other package surface plots. For mixed ordered margins, grid displays are drawn against the requested probability grid, while the returned u columns retain the attainable marginal probability values produced by quasi-inversion. The "empirical" view plots empirical copula coordinates, and "all" gives a base-graphics four-panel diagnostic display.

For grid surfaces, plot.npcopula can add asymptotic or bootstrap intervals. Distribution-copula asymptotic intervals use the joint distribution standard error evaluated at the marginal quasi-inverse grid. Density-copula asymptotic intervals use the plug-in delta-method denominator correction corresponding to c(u)=f(x_u)/\prod_j f_j(x_{u,j}). Bootstrap intervals resample rows and recompute the plotted copula surface on the same probability grid; band="all" overlays transparent pointwise, simultaneous, and Bonferroni wireframes.

Value

npcopula returns an object of class "npcopula". The main components are:

copula

estimated copula distribution value or copula density value.

u1, u2, ...

marginal probability coordinates associated with the sample realizations or evaluation grid.

x, y, ...

marginal quasi-inverse coordinates corresponding to the requested probability grid when grid evaluation is used.

bws

selected unconditional distribution or density bandwidth object.

eval

data frame containing the copula values, probability coordinates, and quasi-inverse coordinates. as.data.frame(object) returns this component for data-frame workflows.

copulaerr

asymptotic or bootstrap standard-error slot. The fitted object stores NA unless an interval-producing plotting route constructs evaluation-specific intervals.

The source data, target, evaluation route, grid dimensions, and timing metadata are retained as list components. The functions fitted, predict, se, summary, as.data.frame, and plot support "npcopula" objects.

Book And Method Pointers

The copula distribution target is C(u)=H(F_1^{-1}(u_1),\ldots,F_d^{-1}(u_d)); the copula density target is c(u)=f(x_u)/\prod_j f_j(x_{u,j}). The mixed-data kernel implementation follows Racine (2015), with quasi-inversion in the sense of Nelsen (2006). For the underlying mixed-data density and distribution estimators, see Li and Racine (2007), Chapter 1 Density Estimation, Chapter 3 Kernel Estimation with Mixed Data, and Racine (2019), Chapter 2 Continuous Density and Cumulative Distribution Functions.

Usage Issues

Use a data.frame rather than cbind for mixed data so that ordered variables remain ordered. Unordered factors are not valid for copula estimation in this implementation.

Author(s)

Jeffrey S. Racine racinej@mcmaster.ca

References

Li, Q. and J.S. Racine (2007), Nonparametric Econometrics: Theory and Practice, Princeton University Press.

Nelsen, R.B. (2006), An Introduction to Copulas, Second Edition, Springer.

Racine, J.S. (2015), “Mixed Data Kernel Copulas”, Empirical Economics, 48, 37–59.

Racine, J.S. (2019), An Introduction to the Advanced Theory and Practice of Nonparametric Econometrics: A Replicable Approach Using R.

See Also

npudistbw, npudist, npudensbw, npudens, np.kernels, np.options, plot

Examples

## Not run: 
library("MASS")

## Example 1: bivariate mixed data, continuous x and ordered y.

set.seed(42)
n <- 1000
n.eval <- 30
rho <- 0.99
mu <- c(0, 0)
Sigma <- matrix(c(1, rho, rho, 1), 2, 2)
xy <- mvrnorm(n = n, mu = mu, Sigma = Sigma)
mydat <- data.frame(
  x = xy[, 1],
  y = ordered(as.integer(cut(xy[, 2],
    quantile(xy[, 2], seq(0, 1, by = .1)),
    include.lowest = TRUE)) - 1)
)

grid.seq <- seq(0, 1, length.out = n.eval)
grid.dat <- data.frame(u1 = grid.seq, u2 = grid.seq)

## Estimate the copula distribution from an npudistbw() object.

bw.cdf <- npudistbw(~ x + y, data = mydat, nmulti = 1)
copula <- npcopula(bws = bw.cdf, data = mydat, u = grid.dat)
summary(copula)

## Native plotting replaces the older manual contour(), persp(), and
## empirical scatterplot calls.

plot(copula, perspective = FALSE, view = "contour")
plot(copula, perspective = FALSE, view = "image")
plot(copula, view = "fixed", zlim = c(0, 1))
if (requireNamespace("rgl", quietly = TRUE))
  plot(copula, view = "fixed", renderer = "rgl", zlim = c(0, 1))
plot(copula)

## Plot empirical copula coordinates from the retained sample data.

plot(copula, view = "empirical")

## Or request the four-panel base-graphics diagnostic display.

plot(copula, view = "all")

## Estimate and plot the copula density from an npudensbw() object.

bw.pdf <- npudensbw(~ x + y, data = mydat, nmulti = 1)
copula.dens <- npcopula(bws = bw.pdf, data = mydat, u = grid.dat)
summary(copula.dens)
plot(copula.dens, view = "fixed")
if (requireNamespace("rgl", quietly = TRUE))
  plot(copula.dens, view = "fixed", renderer = "rgl")
plot(copula.dens)

## Intervals are available for two-dimensional grid surfaces.

plot(copula, errors = "asymptotic", band = "pointwise")
plot(copula, errors = "bootstrap", bootstrap = "inid", B = 399,
     band = "pointwise")

## Prediction evaluates the retained bandwidth object on a supplied
## probability grid.

predict(copula, u = data.frame(x = c(0.25, 0.75),
                               y = c(0.25, 0.75)))
predict(copula, newdata = data.frame(u1 = c(0.25, 0.75),
                                     u2 = c(0.25, 0.75)))

## Example 2: bivariate continuous data.

set.seed(42)
n <- 1000
n.eval <- 30
rho <- 0.99
mu <- c(0, 0)
Sigma <- matrix(c(1, rho, rho, 1), 2, 2)
xy <- mvrnorm(n = n, mu = mu, Sigma = Sigma)
mydat <- data.frame(x = xy[, 1], y = xy[, 2])

grid.seq <- seq(0, 1, length.out = n.eval)
grid.dat <- data.frame(u1 = grid.seq, u2 = grid.seq)

bw.cdf <- npudistbw(~ x + y, data = mydat, nmulti = 1)
copula <- npcopula(bws = bw.cdf, data = mydat, u = grid.dat)
summary(copula)

plot(copula, perspective = FALSE, view = "contour")
plot(copula, perspective = FALSE, view = "image")
plot(copula, view = "fixed", zlim = c(0, 1))
if (requireNamespace("rgl", quietly = TRUE))
  plot(copula, view = "fixed", renderer = "rgl", zlim = c(0, 1))
plot(copula)
plot(copula, view = "empirical")
plot(copula, view = "all")

bw.pdf <- npudensbw(~ x + y, data = mydat, nmulti = 1)
copula.dens <- npcopula(bws = bw.pdf, data = mydat, u = grid.dat)
summary(copula.dens)
plot(copula.dens, view = "fixed", zlim = c(0, 40))
if (requireNamespace("rgl", quietly = TRUE))
  plot(copula.dens, view = "fixed", renderer = "rgl",
       zlim = c(0, 40))
plot(copula.dens, zlim = c(0, 40))

## The formula interface is a shorter route when bandwidths do not need
## to be reused explicitly.

copula.short <- npcopula(~ x + y, data = mydat, neval = n.eval,
                         nmulti = 1)
plot(copula.short, view = "all")

## End(Not run) 

np documentation built on May 16, 2026, 1:07 a.m.