scdensity: Shape-constrained kernel density estimation.

Description Usage Arguments Details Value Constraint details Method details References See Also Examples

Description

scdensity computes kernel density estimates that satisfy specified shape restrictions. It is used in the same way as density, and takes most of that function's arguments. Its default behavior is to compute a unimodal estimate. Use argument constraint to choose different shape constraints, method to choose a different estimation method, and opts to specify method- and constraint-specific options. The result is a list of S3 class scdensity, which may be inspected via print, summary, and plot methods.

Usage

1
2
3
4
5
scdensity(x, bw = "nrd0", constraint = c("unimodal", "monotoneRightTail",
  "monotoneLeftTail", "twoInflections", "twoInflections+", "boundedLeft",
  "boundedRight", "symmetric", "bimodal"), method = c("adjustedKDE",
  "weightedKDE", "greedySharpenedKDE"), opts = NULL, adjust = 1, n = 512,
  na.rm = FALSE)

Arguments

x

A vector of data from which the estimate is to be computed.

bw

The bandwidth. It is specified as either a numerical value or as one of the character strings "nrd0", "nrd", "ucv", "bcv", or "SJ", exactly as in density.

constraint

A vector of strings giving the operative shape constraints. Elements must partially match different alternatives among "unimodal", "monotoneRightTail","monotoneLeftTail", "twoInflections", "twoInflections+", "boundedLeft", "boundedRight", "symmetric", and "bimodal".

method

A string giving the method of enforcing shape constraints. It must paritally match one of "adjustedKDE", "weightedKDE", or "greedySharpenedKDE".

opts

A list giving options specific to the chosen constraints and/or method. E.g. use opts = list(modeLocation = 0) to force the mode to be at zero when the constraint is unimodal. See below for lists of available options.

adjust

A scaling factor for the bandwidth, just as in density.

n

The number of points returned in the density estimate. Same as in density.

na.rm

Logical indicating whether or not to remove missing values from x. Same as in density.

Details

All density estimates in this package use the Gaussian kernel. It is the only common kernel function with three continuous derivatives everywhere. The adjustedKDE and weightedKDE methods require continuous derivatives to ensure numerical stability.

The default estimation method, adjustedKDE, can handle all of the available constraints. The weightedKDE method can handle every constraint except symmetric, while the greedySharpenedKDE method can handle only unimodal, monotoneRightTail, monotoneLeftTail, boundedLeft, and boundedRight. The opts list can also be used to supply method-specific control parameters. See the "Method details" section for more.

Each constraint has a corresponding control parameter that can be supplied as an element of opts. The control parameters are described in the following table. See the "Constraint details" section for definitions of each constraint.

constraints Table

More than one shape constraint can be specified simultaneously. Certain combinations of constraints (e.g., unimodal and monotoneRightTail) are redundant, and will cause a warning. Other combinations (e.g., unimodal and bimodal) are incompatible and will cause an error. The figure below summarizes the valid constraint combinations.

valid constraint combinations

Value

A list with the following elements:

constraint

The constraint(s) used for estimation. Might differ from the constraints supplied to the function if they included redundant constraints.

method

The estimation method used.

f0

A function. Use f0(v) to evaluate the unconstrained KDE at the points in v.

fhat

A function. Use fhat(v) to evaluate the constrained KDE at the points in v.

data

The data used to generate the estimate.

bw

The bandwidth used.

extra

A list holding additional outputs that are specific to the chosen method. See the "method details" section.

x

A vector of abscissa values for plotting the estimate. Same as in density.

y

A vector of ordinate values for plotting the estimate. Same as in density.

n

The sample size, not including missing values. Note, this n has no relation to the n provided in the arguments.

data.name

Deparsed name of the x argument, used in plotting.

call

The call to the function.

has.na

Always FALSE. Included for consistency with density.

Constraint details

All of the constraints other than symmetric are restrictions on the sign of the estimate, or its derviatives, over certain intervals. The boundaries of the intervals may be called important points. If method="greedySharpenedKDE", the important points are determined implicitly during estimation. For the other methods, the locations of the important points may be supplied in opts; in most cases they are optional. If they are not provided, estimation will be run iteratively inside a search routine (SequentialLineMin) to find good values, and these values will be returned in the extra list.

Here is a list of the constraints with their definitions and any relevant comments about their usage.

Method details

The adjustedKDE and weightedKDE methods are implemented using a common framework where the standard KDE is first approximated by a binning step, after which the constrained estimate is obtained. The greedySharpenedKDE method uses a different approach.

adjustedKDE and weightedKDE

The adjustedKDE method is based on the method of Wolters and Braun (2017). The method uses the usual unconstrained kernel density estimate as a pilot estimate, and adjusts the shape of this estimate by adding a function to it. The function is selected to minimally change the shape of the pilot estimate while ensuring the constraints are satisfied. Any of the constraints can be used with this method.

The weightedKDE method is based on the method of Hall and Huang (2002). The method uses a weighted kernel density estimator, with the weights minimally perturbed such that the constraint is satisfied. Any of the constraints except symmetric may be used with this method.

For either of these methods, the following optional arguments can be provided as elements of opts:

When either of these methods are used, the output list extra contains elements giving the locations of the important points used in the final estimate (e.g., modeLocation if the estimate is unimodal or bimodal). Additionally, it containts the following elements:

greedySharpenedKDE

The greedySharpenedKDE method is described in Wolters (2012a, 2012b). It uses a data sharpening (shifting the data points) approach. Starting from an initial solution that satisfies the constraints, a greedy algorithm (implemented in the function improve) is used to move the points as close as possible to the observed data while maintaining feasibility.

The following optional arguments can be provided as elements of opts:

When this method is used, the output list extra contains the following elements:

References

Hall and Huang (2002), Unimodal Density Estimation Using Kernel Methods, Statistica Sinica, 12, 965-990.

Wolters and Braun (2017), Enforcing Shape Constraints on a Probability Density Estimate Using an Additive Adjustment curve, Communications in Statistics - Simulation and Computation, available online.

Wolters (2012a), A Greedy Algorithm for Unimodal Kernel Density Estimation by Data Sharpening, Journal of Statistical Software, 46(6), 1–26.

Wolters (2012b), Methods for Shape-Constrained Kernel Density Estimation. Ph.D. Thesis, University of Western Ontario.

See Also

plot.scdensity plot method, print.scdensity print method, and summary.scdensity summary method.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# Default method gives a unimodal estimate using adjustment curve method.
x <- rlnorm(30)
scKDE <- scdensity(x)
scKDE
summary(scKDE)
plot(scKDE, detail=2)
plot(scKDE, detail=4)

# Constrain the first and fourth quartiles to be monotone, using greedy sharpening method.
x <- rt(50, df=3)
scKDE <- scdensity(x, bw="SJ", adjust=0.5, constraint=c("monotoneL", "monotoneR"),
                   opts=list(verbose=TRUE, leftTail=25, rightTail=75), method="greedy")
plot(scKDE)

# Compare unimodal, twoInflections, and twoInflections+ constraints
x <- rnorm(100)
h <- 0.5 * bw.SJ(x)
fhat1 <- scdensity(x, bw=h, constraint="unimodal")
fhat2 <- scdensity(x, bw=h, constraint="twoInflections")
fhat3 <- scdensity(x, bw=h, constraint="twoInflections+")
plot(density(x, bw=h))
lines(fhat1$x, fhat1$y, col="red")
lines(fhat2$x, fhat2$y, col="blue")
lines(fhat3$x, fhat3$y, col="green", lwd=2)

scdensity documentation built on May 1, 2019, 10:26 p.m.