ParetoDensityEstimation: Pareto Density Estimation V3

View source: R/ParetoDensityEstimation.R

ParetoDensityEstimationR Documentation

Pareto Density Estimation V3

Description

This function estimates the Pareto Density for the distribution of one variable. In the default setting the functions estimates internally the appropriate number and position of kernels to estimate the density properly. However, the user can set the kernels manually. In this case density will only be estimated only around these values even if data exists outside the range of kernels or the internally estimated paretoRadius does not contain all datapoints between each kernel. See example for details.

Usage

ParetoDensityEstimation(Data, paretoRadius, kernels = NULL,
  MinAnzKernels = 100,PlotIt=FALSE,Silent=FALSE)

Arguments

Data

[1:n] numeric vector of data.

paretoRadius

Optional scalar, numeric value, see ParetoRadius.If not given it is estimated internally. Please do not set manually

kernels

Optional,[1:m] numeric vector data values where pareto density is measured at. If 0 (by default) kernels will be computed.

MinAnzKernels

Optional, minimal number of kernels, default MinAnzKernels==100

PlotIt

Optional, if TRUE: raw basic r plot of density estimation of debugging purposes. Usually please use ggplot2 interface via PDEplot or MDplot

Silent

Optional, if TRUE: disables all warnings

Details

Pareto Density Estimation (PDE) is a method for the estimation of probability density functions using hyperspheres. The Pareto-radius of the hyperspheres is derived from the optimization of information for minimal set size. It is shown, that Pareto Density is the best estimate for clusters of Gaussian structure. The method is shown to be robust when cluster overlap and when the variances differ across clusters. This is the best density estimation to judge Gaussian Mixtures of the data see [Ultsch 2003].

If input argument kernels is set manually the output arguments paretoDensity_internal and kernels_internal provide the internally estimated density and kernels. Otherwise these arguments are NULL. The function provides a message if range of kernels and range of data does not overlap completly.

Typically it is not advisable to set paretoRadius manually. However in specific cases, the function ParetoRadius is used prior to calling this function. In such cases the input argument can use a priorly estimated paretoRadius.

Value

List With

kernels

[1:m] numeric vector. data values at with Pareto Density is measured.

paretoDensity

[1:m] numeric vector containing the determined density by paretoRadius.

paretoRadius

numeric value of defining the radius

kernels_internal

Either NULL or internally estimated [1:p] numeric vector of kernels if input argument kernels was set by the user

paretoDensity_internal

Either NULL or internally estimated density if input argument kernels was set by the user

Note

This the second version of the function prior available in AdaptGauss

Author(s)

Michael Thrun

References

Ultsch, A.: Pareto density estimation: A density estimation for knowledge discovery, in Baier, D.; Werrnecke, K. D., (Eds), Innovations in classification, data science, and information systems, Proc Gfkl 2003, pp 91-100, Springer, Berlin, 2005.

See Also

ParetoRadius

PDEplot

MDplot

Examples

   
   #kernels are estimated internally
   data = c(rnorm(1000),rnorm(2000)+2,rnorm(1000)*2-1)
   pdeVal        <- ParetoDensityEstimation(data)
   plot(pdeVal$kernels,pdeVal$paretoDensity,type='l',xaxs='i',
   yaxs='i',xlab='Data',ylab='PDE')
   
   ##data exist outside of the range kernels
   kernels=seq(from=-3,to=3,by=0.01) 
   pdeVal        <- ParetoDensityEstimation(data,  kernels=kernels)
   plot(pdeVal$kernels,pdeVal$paretoDensity,type='l',xaxs='i',
   yaxs='i',xlab='Data',ylab='PDE')
   
   #data exists in-between kernels that is not measured
   pdeVal$paretoRadius#0.42
   kernels=seq(from=-8,to=8,by=1)
   pdeVal        <- ParetoDensityEstimation(data,  kernels=kernels)
   plot(pdeVal$kernels,pdeVal$paretoDensity,type='l',xaxs='i',
   yaxs='i',xlab='Data',ylab='PDE')
   
   

DataVisualizations documentation built on Oct. 10, 2023, 9:06 a.m.