estimate_density_sorted: Estimate the density (and its derivatives), for sorted input...

Description Usage Arguments Value Examples

View source: R/estimate_density.R

Description

Estimate the density and its first and second derivatives as input for efficient estimators of the treatment effect. The function implements fast adaptive kernel density estimation by calling a C++ subroutine. For fast computation, the input must be sorted.

Usage

1
2
3
4
5
6
7
8
9
estimate_density_sorted(
  x0,
  dat,
  estDerivs = TRUE,
  kernel = "triweight",
  sd_dat = "norm90",
  adapt = TRUE,
  fdat = NULL
)

Arguments

x0

numeric vector, points at which the density should be evaluated (must be sorted from small to large)

dat

numeric vector, random sample from the distribution for which the density should be estimated.

estDerivs

logical, should the first and second derivative be estimated? (default = TRUE) The first and second derivative of the density are needed for efficient estimators of the density. But adaptive estimates of the density require an initial estimate of the density (but not its derivatives). Hence it can be convenient to be able to skip estimating the derivatives for these initial estimates.

kernel

string, indicates which kernel should be used. Currently only "triweight" is implemented.

sd_dat

string or positive numeric, how should the standard deviation be calculated for the Silverman rule-of-thumb bandwidth? Can be "sd" (the usual standard deviation) or "norm90" (the default) which takes the difference of the 0.95 and 0.05 quantiles and divides by the corresponding range of the normal distribution for an estimate of the standard deviation (if the data is normally distributed) that tends to be substantially less affected by outliers even if the data is not normal. Alternatively, can be a positive number giving the standard error to be plugged into Silverman's rule-of-thumb bandwidth to manually over / undersmooth.

adapt

logical, should adaptive estimation be used? (default = TRUE) Intuitively, adaptive estimation recognizes that there is less data in the tails of the distribution and hence a larger bandwidth may be helpful in low-density regions than in high-density regions. However, it is computationally more cumbersome.

fdat

numeric vector of the same length as dat, the density evaluated at the points in dat. Used to calculate the adaptive bandwidth if adapt is TRUE. If adapt is TRUE and fdat is not supplied, the function calls itself to estimate the density.

Value

a matrix with length(x0) rows and 3 columns if estDerivs is TRUE, otherwise 1 column. The first column contains the estimated density at each point in x0. The second and third columns contain the estimated first and second derivatives of the density (not log density) at each point in x0, respectively.

Examples

1
2
3
4
5
6
# draw a random sample from the standard normal distribution and ensure data is sorted
X <- sort(rnorm(n=1000))
# estimate the density
fX <- estimate_density_sorted(X,X,estDerivs=FALSE)
# plot the familiar bell curve
plot(X,fX)

michaelpollmann/parTreat documentation built on Dec. 21, 2021, 5:58 p.m.