DE.FEM.time: Nonparametric spatio-temporal density estimation with...

View source: R/DE_time.R

DE.FEM.timeR Documentation

Nonparametric spatio-temporal density estimation with differential regularization

Description

This function implements a nonparametric spatio-temporal density estimation method with differential regularization (given by the sum of the square of the L2 norm of the laplacian of the density function and the square of the L2 norm of the second- order time-derivative), when points are located over a planar mesh. The computation relies only on the C++ implementation of the algorithm.

Usage

DE.FEM.time(data, data_time, FEMbasis, mesh_time, lambda, lambda_time, fvec=NULL,
                   heatStep=0.1, heatIter=10, stepProposals=NULL, tol1=1e-4, tol2=0, 
                   print=FALSE, nfolds=NULL, nsimulations=500, step_method="Fixed_Step", 
                   direction_method="BFGS", preprocess_method="NoCrossValidation",
                   search="tree", isTimeDiscrete=FALSE, flagMass=FALSE, flagLumped=FALSE)

Arguments

data

A matrix of dimensions #observations-by-ndim. Data are locations: each row corresponds to one point, the first column corresponds to the x-coordinates, the second column corresponds to the y-coordinates and, if ndim=3, the third column corresponds to the z-coordinates.

data_time

A vector of length #observations. The i-th datum is the time instant during which the i-th location is observed (according to the order in which locations are provided in data).

FEMbasis

A FEMbasis object describing the Finite Element basis, as created by create.FEM.basis.

mesh_time

A vector containing the b-splines knots for separable smoothing. It is the time mesh of the considered time domain (interval). Its nodes are in increasing order.

lambda

A scalar or vector of smoothing parameters in space. If it is a vector, the optimal smoothing parameter in space is chosen, together with the optimal smoothing parameter in time, with a k-fold cross-validation procedure based on the L2 norm.

lambda_time

A scalar or vector of smoothing parameters in time. If it is a vector, the optimal smoothing parameter in time is chosen, together with the optimal smoothing parameter in space, with a k-fold cross-validation procedure based on the L2 norm.

fvec

A vector of length #nodes of the spatial mesh times #B-spline temporal functional basis. It corresponds to the node values of the initial density function. If this is NULL the initial density is estimated thanks to a discretized heat diffusion process that starts from the empirical density of the data. Default is NULL. N.B. This vector cannot be the constant vector of zeros since the algorithm works with the log(f).

heatStep

A real specifying the time step for the discretized heat diffusion process.

heatIter

An integer specifying the number of iterations to perform the discretized heat diffusion process.

stepProposals

A scalar or a vector containing the step parameters useful for the descent algorithm. If there is a vector of parameters, the biggest one such that the functional decreases at each iteration is chosen. If it is NULL the following vector c(0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001, 1e-7, 1e-8, 1e-9) is proposed. Default is NULL. N.B. If the program does not receive a right parameter, it aborts the R session. Try a smaller parameter.

tol1

A scalar specifying the tolerance to use for the termination criterion based on the percentage difference between two consecutive iterations of the minimization algorithm of the loss function, the log-likelihood and the penalizations. Default is 1e-5.

tol2

A scalar specifying the tolerance to use for the termination criterion based on the norm of the gradient of the functional to be minimized (the true minimum is such that this norm is zero). The default version does not use this criterion. Default is 0.

print

A boolean that is TRUE if the user wants the value of the functional, of the loglikelihood and of the penalization terms printed on console at each iteration of the descent algorithm (plus some other information/warnings). Default is FALSE. N.B. We suggest to let it FALSE if preprocess_method is 'RightCV' or 'SimplifiedCV'.

nfolds

An integer specifying the number of folds used in cross validation technique to find the best pair of (lambda, lambda_time) smoothing parameters. If there is only one pair of (lambda, lambda_time) it can be NULL. Default is NULL.

nsimulations

An integer specifying the number of iterations used in the optimization algorithms. Default value is 500.

step_method

A string specifying which step method to use in the descent algorithm. If it is Fixed_Step, the step is constant during the algorithm and it is chosen according to stepProposals; if it is Backtracking_Method, the step is computed at each iteration according to the backtracking method; finally if it is Wolfe_Method, the step is computed at each iteration according to the Wolfe method. Default is Fixed_Step.

direction_method

A string specifying which descent direction to use in the descent algorithm. If it is Gradient, the direction is the one given by the gradient descent method (the opposite to the gradient of the functional); if instead it is BFGS the direction is the one given by the BFGS method (Broyden-Fletcher-Goldfarb-Shanno, a Quasi-Newton method). Default is BFGS. Other possible choices: Conjugate Gradient direction with Fletcher-Reeves formula (ConjugateGradientFR), Conjugate Gradient direction with Polak-RibiƩre-Polyak formula (ConjugateGradientPRP), Conjugate Gradient direction with Hestenes-Stiefel formula (ConjugateGradientHS), Conjugate Gradient direction with Dai-Yuan formula (ConjugateGradientDY), Conjugate Gradient direction with Conjugate-Descent formula (ConjugateGradientCD), Conjugate Gradient direction with Liu-Storey formula (ConjugateGradientLS), L-BFGS direction with 5 correction vectors (L-BFGS5), L-BFGS direction with 10 correction vectors (L-BFGS10).

preprocess_method

A string specifying the k fold cross validation technique to use, if there is more than one pair of smoothing parameters in space and in time (lambda, lambda_time); otherwise it should be NULL. If it is RightCV the usual k fold cross validation method is performed. If it is SimplifiedCV a simplified version is performed. In the latter case the number of possible pairs of smoothing parameters in space and in time (lambda, lambda_time) must be equal to the number of folds nfolds. Default is NULL.

search

A flag to decide the search algorithm type (tree or naive or walking search algorithm).

isTimeDiscrete

A boolean specifying the time data type: TRUE for discrete (with many duplicates) time data; FALSE for continuous time data. Default is FALSE.

flagMass

A boolean specifying whether to consider full mass matrices (TRUE) or identity mass matrices (FALSE) for the computation of space and time penalty matrices. Default is FALSE.

flagLumped

A boolean specifying whether to perform mass lumping. This numerical technique presents computational advantages during the procedure involving a mass matrix inversion for the computation of the space penalty matrix. Default is FALSE. N.B. We suggest to put it TRUE in case of a large spatial domain or in case of a dense/refined spatial mesh.

Value

A list with the following variables:

FEMbasis

Given FEMbasis with tree information.

g

A vector of length #nodes times #B-splines that represents the value of the g-function estimated for each node of the spatial mesh and at each time instant of the time mesh. The density is the exponential of this function.

f_init

A #nodes-by-#lambdax#lambda_time parameters matrix. Each column contains the node values of the initial density used for the pair (lambda, lambda_time) given by the column.

lambda

A scalar representing the optimal smoothing parameter in space selected, together with lambda_time, via k fold cross validation, if in the input there is a vector of parameters (in space and/or in time); the scalar given in input otherwise.

lambda_time

A scalar representing the optimal smoothing parameter in time selected, together with lambda, via k fold cross validation, if in the input there is a vector of parameters (in space and/or in time); the scalar given in input otherwise.

data

A matrix of dimensions #observations-by-ndim containing the spatial data used in the algorithm. They are the same given in input if the domain is 2D pr 3D; they are the original data projected on the mesh if the domain is 2.5D. Data lying outside the spatial domain, defined through its mesh, are not considered.

data_time

A vector of length #observations containing the time data used in the algorithm. Data lying outside the temporal domain, defined through its mesh, are not considered.

CV_err

A vector of length nfolds containing the cross validation errors obtained in each fold, if preprocess_method is either RightCV or SimplifiedCV.

Examples

library(fdaPDE)

## Create a 2D mesh over a squared domain
Xbound <- seq(-3, 3, length.out = 10)
Ybound <- seq(-3, 3, length.out = 10)
grid_XY <- expand.grid(Xbound, Ybound)
Bounds <- grid_XY[(grid_XY$Var1 %in% c(-3, 3)) | (grid_XY$Var2 %in% c(-3, 3)), ]
mesh <- create.mesh.2D(nodes = Bounds, order = 1)
mesh <- refine.mesh.2D(mesh, maximum_area = 0.25, minimum_angle = 20)
FEMbasis <- create.FEM.basis(mesh)

## Create a 1D time mesh over a (non-negative) interval
mesh_time <- seq(0, 1, length.out=3)

## Generate data
n <- 50
set.seed(10)
x <- rnorm(n,0,2)
y <- rnorm(n,0,2)
locations <- cbind(x,y)
times <- runif(n,0,1)
data <- cbind(locations, times)

t <- 0.5 # time instant in which to evaluate the solution

plot(mesh)
sample <- data[abs(data[,3]-t)<0.05,1:2]
points(sample, col="red", pch=19, cex=1, main=paste('Sample | ', t-0.05,' < t < ', t+0.05))

## Density Estimation
lambda <- 0.1
lambda_time <- 0.001
sol <- DE.FEM.time(data = locations, data_time = times, FEMbasis = FEMbasis, 
                   mesh_time = mesh_time, lambda = lambda, lambda_time = lambda_time,
                   fvec=NULL, heatStep=0.1, heatIter=10, stepProposals=NULL, tol1=1e-4, tol2=0,
                   print=FALSE, nfolds=NULL, nsimulations=300, step_method="Fixed_Step",
                   direction_method="BFGS", preprocess_method="NoCrossValidation",
                   search="tree", isTimeDiscrete=0, flagMass=0, flagLumped=0)

## Visualization
n = 100
X <- seq(-3, 3, length.out = n)
Y <- seq(-3, 3, length.out = n)
grid <- expand.grid(X, Y)

FEMfunction = FEM.time(sol$g, mesh_time, FEMbasis, FLAG_PARABOLIC = FALSE)
evaluation <- eval.FEM.time(FEM.time = FEMfunction, locations = grid, time.instants = t)
image2D(x = X, y = Y, z = matrix(exp(evaluation), n, n), col = heat.colors(100),
        xlab = "x", ylab = "y", contour = list(drawlabels = FALSE),
        main = paste("Estimated density at t = ", t), zlim=c(0,0.2), asp = 1)


fdaPDE documentation built on March 7, 2023, 5:28 p.m.