HRR_pt_est: Hazard Rate Ratio Point Estimations
In zhicongz/AnomDetct: Anomaly Detection via Scan Statistics

Description Usage Arguments Details Value Note Author(s) See Also Examples

This function returns point Estimations of probability density function and hazard rate ratio function.

1 2	HRR_pt_est(pt_int,cdf_sample,kernel = "gaussian", hazard_bandwidth = NULL, knn = NULL)

`pt_int`	a vector of estimated points.
`cdf_sample`	a sorted vector that needs to be estimated.
`kernel`	a character string giving the smoothing kernel to be used. This must partially match one of "`gaussian`", "`rectangular`", "`triangular`" or "`knn`", with default "`gaussian`".
`hazard_bandwidth`	the smoothing bandwidth to be used.
`knn`	number of neighbor points to be considered in smoothing for the "`knn`" kernel.

Hazard rate ratio function is defined as:

HRR = HR(unif)/HR(est)

Here, the HR(unif) is the hazard rate function of uniform distribution while the HR(est) is the hazard rate function of estimated density function.

HR(est) = f(est)/(1-F(est))

f(est) is the estimated probability density function and F(est) is the estimated cumulative distribution function.

f(est) and F(est) comes from the local quadratic polynomial density estimation of cumulative distribution function. f(est) is the coefficient or the linear term while F(est) is the constant term.

When kernel is "rectangular" or "triangular" and hazard_bandwidth is over small, number of observations that are considered in points estimation may not be enough for solving a quadratic polynomial equation. In this case, if design matrix rank is 2, the function fit a linear polynomial equation. If design matrix rank is 1, f(est) is the percentage of points occur in corresponding bin and F(est) is mean of points in corresponding bin. If design matrix rank is 0, f(est) = 0 and F(est) is missing.

The domain of cdf_sample is on (0,1), which is a bounded interval. To elimiate the bias close to boundary points, reflection is being used here. All the observations are reflected on points 0 and 1. The local quadratic polynomial density estimation is done on the extended cdf_sample.

HRR_pt_est is done by solving systems of linear equations. With "gaussian" kernel, the design matrix always use all the obervations even though the obervations that are far away from the estimated point and make negligible contribution. However, the computation for large dimension linear equations system is complicated. Thus, "gaussian" is not recommended from the efficiency perspective when length(cdf_sample) is huge. Function HRR_sbsp_est is designed to solve such problems.

This function returns a list with components:

`fhat`	A function performing the linear interpolation of smoothed probability density function of given data points.
`HRR`	A function performing the linear interpolation of smoothed hazard rate ratio point estimations.

If package Matrix is installed, function solve is used for solving the linear equations. If not, function qr.solve is applied.

Zhicong Zhao

HRR_sbsp_est for using this function via subsampling.

temp <- HRR_pt_est(pt_int = seq(0,1,0.1),
                   cdf_sample = sort(rbeta(10000,2,5)),
                   kernel = "triangular",
                   hazard_bandwidth = 0.1)

## plot ##
plot(temp$fhat,col = "blue",xlab = NA,ylab = NA)
points(seq(0,1,0.1),dbeta(seq(0,1,0.1),2,5),type = "l",col = "red")
legend("top",legend = c("estimated density","population density"),
       lty = 1, col = c("blue","red"))