DE.FEM: Nonparametric density estimation with differential...

View source: R/DE.R

DE.FEMR Documentation

Nonparametric density estimation with differential regularization

Description

This function implements a nonparametric density estimation method with differential regularization (given by the square root of the L2 norm of the laplacian of the density function), when points are located over a planar mesh. The computation relies only on the C++ implementation of the algorithm.

Usage

DE.FEM(data, FEMbasis, lambda, fvec=NULL, heatStep=0.1, heatIter=500, 
       stepProposals=NULL,tol1=1e-4, tol2=0, print=FALSE, nfolds=NULL,
       nsimulations=500, step_method="Fixed_Step", direction_method="BFGS",
       preprocess_method="NoCrossValidation", search = "tree")

Arguments

data

A matrix of dimensions #observations-by-ndim. Data are locations: each row corresponds to one point, the first column corresponds to the x-coordinates, the second column corresponds to the y-coordinates and, if ndim=3, the third column corresponds to the z-coordinates.

FEMbasis

A FEMbasis object describing the Finite Element basis, as created by create.FEM.basis.

lambda

A scalar or vector of smoothing parameters. If it is a vector, the optimal smoothing parameter is choosen with a k-fold cross-validation procedure based on the L2 norm.

fvec

A vector of length #nodes of the mesh. It corresponds to the node values of the initial density function. If this is NULL the initial density is estimated thanks to a discretized heat diffusion process that starts from the empirical density of the data. Default is NULL. N.B. This vector cannot be the constant vector of zeros since the algortihm works with the log(f).

heatStep

A real specifying the time step for the discretized heat diffusionn process.

heatIter

An integer specifying the number of iterations to perform the discretized heat diffusion process.

stepProposals

A scalar or a vector containing the step parameters useful for the descent algorithm. If there is a vector of parameters, the biggest one such that the functional decreases at each iteration is choosen. If it is NULL the following vector c(0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001, 1e-7, 1e-8, 1e-9) is proposed. Default is NULL. N.B. If the program does not receive a right parameter, it aborts the R session. Try a smaller parameter.

tol1

A scalar specifying the tolerance to use for the termination criterion based on the percentage difference between two consecutive iterations of the minimization algorithm of the loss function, the log-likelihood and the penalization. Default is 1e-5.

tol2

A scalar specifying the tolerance to use for the termination criterion based on the norm of the gradient of the functional to be minimized (the true minimum is such that this norm is zero). The default does not use this criterion. Default is 0.

print

A boolean that is TRUE if the user wants the value of the functional, of the loglikelihood and of the penalization term printed on console at each iteration of the descent algorithm. Default is FALSE. N.B. We suggest to let it FALSE if preprocess_method is 'RightCV' or 'SimplifiedCV'.

nfolds

An integer specifying the number of folds used in cross validation techinque to find the best lambda parameter. If there is only one lambda it can be NULL. Default is NULL.

nsimulations

An integer specifying the number of iterations used in the optimization algorithms. Default value is 500.

step_method

String. This parameter specifies which step method use in the descent algorithm. If it is Fixed_Step, the step is constant during all the algorithm and it is choosen according to stepProposals; if it is Backtracking_Method, the step is computed at each iteration according to the backtracking method; finally if it is Wolfe_Method, the step is computed at each iteration according to the Wolfe method. Default is Fixed_Step.

direction_method

String. This parameter specifies which descent direction use in the descent algorithm. If it is Gradient, the direction is the one given by the gradient descent method (the opposite to the gradient of the functional); if instead it is BFGS the direction is the one given by the BFGS method (Broyden Fletcher Goldfarb and Shanno, a Quasi-Newton method). Default is BFGS. Other possible choices: Conjugate Gradient direction with Fletcher-Reeves formula (ConjugateGradientFR), Conjugate Gradient direction with Polak-RibiƩre-Polyak formula (ConjugateGradientPRP), Conjugate Gradient direction with Hestenes-Stiefel formula (ConjugateGradientHS), Conjugate Gradient direction with Dai-Yuan formula (ConjugateGradientDY), Conjugate Gradient direction with Conjugate-Descent formula (ConjugateGradientCD), Conjugate Gradient direction with Liu-Storey formula (ConjugateGradientLS), L-BFGS direction with 5 correction vectors (L-BFGS5), L-BFGS direction with 10 correction vectors (L-BFGS10).

preprocess_method

String. This parameter specifies the k fold cross validation technique to use, if there is more than one smoothing parameter lambda (otherwise it should be NULL). If it is RightCV the usual k fold cross validation method is performed. If it is SimplifiedCV a simplified version is performed. In the latter case the number of smoothing parameters lambda must be equal to the number of folds nfolds. Default is NULL.

search

a flag to decide the search algorithm type (tree or naive or walking search algorithm).

Value

A list with the following variables:

FEMbasis

Given FEMbasis with tree informations.

g

A vector of length #nodes that represents the value of the g-function estimated for each node of the mesh. The density is the exponential of this function.

f_init

A #nodes-by-#lambda parameters matrix. Each column contains the node values of the initial density used for the lambda given by the column.

lambda

A scalar representing the optimal smoothing parameter selected via k fold cross validation, if in the input there is a vector of parameters; the scalar given in input otherwise.

data

A matrix of dimensions #observations-by-ndim containing the data used in the algorithm. They are the same given in input if the domain is 2D pr 3D; they are the original data projected on the mesh if the domain is 2.5D.

CV_err

A vector of length nfolds containing the cross validation errors obtained in each fold, if preprocess_method is either RightCV or SimplifiedCV.

References

  • Ferraccioli, F., Arnone, E., Finos, L., Ramsay, J. O., Sangalli, L. M. (2021). Nonparametric density estimation over complicated domains. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83(2), 346-368.

  • Arnone, E., Ferraccioli, F., Pigolotti, C., Sangalli, L.M. (2021), A roughness penalty approach to estimate densities over two-dimensional manifolds, Computational Statistics and Data Analysis, to appear.

Examples

library(fdaPDE)

## Create a 2D mesh over a squared domain
Xbound <- seq(-3, 3, length.out = 10)
Ybound <- seq(-3, 3, length.out = 10)
grid_XY <- expand.grid(Xbound, Ybound)
Bounds <- grid_XY[(grid_XY$Var1 %in% c(-3, 3)) | (grid_XY$Var2 %in% c(-3, 3)), ]
mesh <- create.mesh.2D(nodes = Bounds, order = 1)
mesh <- refine.mesh.2D(mesh, maximum_area = 0.2)
FEMbasis <- create.FEM.basis(mesh)

## Generate data
n <- 50
set.seed(10)
data_x <- rnorm(n)
data_y <- rnorm(n)
data <- cbind(data_x, data_y)

plot(mesh)
points(data, col="red", pch=19, cex=0.5)

## Density Estimation
lambda = 0.1
sol <- DE.FEM(data = data, FEMbasis = FEMbasis, lambda = lambda, fvec=NULL, heatStep=0.1,
                  heatIter=500, stepProposals=NULL, tol1=1e-4, tol2=0, print=FALSE, 
                  nfolds=NULL, nsimulations=300,step_method = "Fixed_Step", 
                  direction_method = "BFGS",preprocess_method="NoCrossValidation")

## Visualization 
n = 100
X <- seq(-3, 3, length.out = n)
Y<- seq(-3, 3, length.out = n)
grid <- expand.grid(X, Y)

evaluation <- eval.FEM(FEM(FEMbasis, coeff = sol$g), locations = grid)
evaluation <- exp(evaluation)
eval <- matrix(evaluation, n, n)

image2D(x = X, y = Y, z = eval, col = heat.colors(100), xlab = "x", ylab = "y", 
        contour = list(drawlabels = FALSE), main = "Estimated density")

fdaPDE documentation built on March 7, 2023, 5:28 p.m.