smooth.FEM.time: Space-time regression with differential regularization

View source: R/smoothing_time.R

smooth.FEM.timeR Documentation

Space-time regression with differential regularization

Description

Space-time regression with differential regularization. Space-varying covariates can be included in the model. The technique accurately handle data distributed over irregularly shaped domains. Moreover, various conditions can be imposed at the domain boundaries.

Usage

smooth.FEM.time(locations = NULL, time_locations = NULL, observations, FEMbasis, 
time_mesh=NULL, covariates = NULL, PDE_parameters = NULL,  BC = NULL,
incidence_matrix = NULL, areal.data.avg = TRUE,
FLAG_MASS = FALSE, FLAG_PARABOLIC = FALSE, FLAG_ITERATIVE = FALSE,
threshold = 10^(-4), max.steps = 50, IC = NULL,
search = "tree", bary.locations = NULL,
family = "gaussian", mu0 = NULL, scale.param = NULL,
threshold.FPIRLS = 0.0002020, max.steps.FPIRLS = 15,
lambda.selection.criterion = "grid", DOF.evaluation = NULL, 
lambda.selection.lossfunction = NULL, lambdaS = NULL, lambdaT = NULL, 
DOF.stochastic.realizations = 100, DOF.stochastic.seed = 0, 
DOF.matrix = NULL, GCV.inflation.factor = 1, lambda.optimization.tolerance = 0.05,
inference.data.object=NULL)

Arguments

locations

A matrix where each row specifies the spatial coordinates x and y (and z if ndim=3) of the corresponding observations in the vector observations. This parameter can be NULL. In this case, if also the incidence matrix is NULL the spatial coordinates are assumed to coincide with the nodes of the mesh.

time_locations

A vector containing the times of the corresponding observations in the vector observations. This parameter can be NULL. In this case the temporal locations are assumed to coincide with the nodes of the time_mesh.

observations

A matrix of #locations x #time_locations with the observed data values over the spatio-temporal domain. The spatial locations of the observations can be specified with the locations argument.

FEMbasis

A FEMbasis object describing the Finite Element basis, as created by create.FEM.basis.

time_mesh

A vector specifying the time mesh.

covariates

A #observations-by-#covariates matrix where each row represents the covariates associated with the corresponding observed data value in observations.

PDE_parameters

A list specifying the parameters of the PDE in the regularizing term. Default is NULL, i.e. regularization is by means of the Laplacian (stationary, isotropic case). If the PDE is elliptic it must contain: K, a 2-by-2 matrix of diffusion coefficients. This induces an anisotropic smoothing with a preferential direction that corresponds to the first eigenvector of the diffusion matrix K; b, a vector of length 2 of advection coefficients. This induces a smoothing only in the direction specified by the vector b; c, a scalar reaction coefficient. c induces a shrinkage of the surface to zero If the PDE is space-varying it must contain: K, a function that for each spatial location in the spatial domain (indicated by the vector of the 2 spatial coordinates) returns a 2-by-2 matrix of diffusion coefficients. This induces an anisotropic smoothing with a local preferential direction that corresponds to the first eigenvector of the diffusion matrix K.The function must support recycling for efficiency reasons, thus if the input parameter is a #point-by-2 matrix, the output should be an array with dimensions 2-by-2-by-#points.b, a function that for each spatial location in the spatial domain returns a vector of length 2 of transport coefficients. This induces a local smoothing only in the direction specified by the vector b. The function must support recycling for efficiency reasons, thus if the input parameter is a #point-by-2 matrix, the output should be a matrix with dimensions 2-by-#points; c, a function that for each spatial location in the spatial domain returns a scalar reaction coefficient. c induces a shrinkage of the surface to zero. The function must support recycling for efficiency reasons, thus if the input parameter is a #point-by-2 matrix, the output should be a vector with length #points; u, a function that for each spatial location in the spatial domain returns a scalar reaction coefficient. u induces a reaction effect. The function must support recycling for efficiency reasons, thus if the input parameter is a #point-by-2 matrix, the output should be a vector with length #points. For 2.5D and 3D only the Laplacian is available (PDE_parameters=NULL)

BC

A list with two vectors: BC_indices, a vector with the indices in nodes of boundary nodes where a Dirichlet Boundary Condition should be applied; BC_values, a vector with the values that the spatial field must take at the nodes indicated in BC_indices.

incidence_matrix

A #regions-by-#triangles/tetrahedrons matrix where the element (i,j) equals 1 if the j-th triangle/tetrahedron is in the i-th region and 0 otherwise. This is only for areal data. In case of pointwise data, this parameter is set to NULL.

areal.data.avg

Boolean. It involves the computation of Areal Data. If TRUE the areal data are averaged, otherwise not.

FLAG_MASS

Boolean. This parameter is considerd only for separable problems i.e. when FLAG_PARABOLIC==FALSE. If TRUE the mass matrix in space and in time are used, if FALSE they are substituted with proper identity matrices.

FLAG_PARABOLIC

Boolean. If TRUE the parabolic problem problem is selected, if FALSE the separable one.

FLAG_ITERATIVE

Boolean. If TRUE the iterative method is selected, if FALSE the monolithic one.

threshold

This parameter is used for arresting algorithm iterations. Algorithm stops when two successive iterations lead to improvement in penalized log-likelihood smaller than threshold. Default value threshold = 10^(-4).

max.steps

This parameter is used to limit the maximum number of iteration. Default value max.steps=50.

IC

Initial condition needed in case of parabolic problem i.e. when FLAG_PARABOLIC==TRUE. If FLAG_PARABOLIC==FALSE this parameter is ignored. If FLAG_PARABOLIC=TRUE and IC=NULL it is necessary to provide also data at the initial time. IC will be estimated from them.

search

a flag to decide the search algorithm type (tree or naive or walking search algorithm).

bary.locations

A list with three vectors: locations, location points which are same as the given locations options. (checks whether both locations are the same); element ids, a vector of element id of the points from the mesh where they are located; barycenters, a vector of barycenter of points from the located element.

family

This parameter specify the distibution within exponential family used for GLM model. The following distribution are implemented: "binomial", "exponential", "gamma", "poisson", "gaussian", "invgaussian". The default link function for binomial is logit if you want either probit or clogloc set family = "probit", family = "cloglog".

mu0

This parameter is a vector that set the starting point for FPIRLS algorithm. It represent an initial guess of the location parameter. Default is set to observation for non binary distribution while equal to 0.5(observations + 0.5) for binary data.

scale.param

Dispersion parameter of the chosen distribution. This is only required for "gamma", "gaussian", "invgaussian". User may specify the parameter as a positive real number. If the parameter is not supplied, it is estimated from data according to Wilhelm Sangalli 2016.

threshold.FPIRLS

This parameter is used for arresting algorithm iterations. Algorithm stops when two successive iterations lead to improvement in penalized log-likelihood smaller than threshold.FPIRLS. Default value threshold.FPIRLS = 0.0002020.

max.steps.FPIRLS

This parameter is used to limit the maximum number of iteration. Default value max.steps.FPIRLS=15.

lambda.selection.criterion

This parameter is used to select the optimization method related to smoothing parameter lambda. The following methods are implemented: 'grid', further optimization methods are yet to come. The 'grid' is a pure evaluation method, therefore a vector of lambda testing penalizations must be provided. Default value lambda.selection.criterion='grid'

DOF.evaluation

This parameter is used to identify if and how degrees of freedom computation has to be performed. The following possibilities are allowed: NULL, 'exact' and 'stochastic' In the former case no degree of freedom is computed, while the other two methods enable computation. Stochastic computation of DOFs may be slightly less accurate than its deterministic counterpart, but is highly suggested for meshes of more than 5000 nodes, being fairly less time consuming. Default value DOF.evaluation=NULL

lambda.selection.lossfunction

This parameter is used to understand if some loss function has to be evaluated. The following possibilities are allowed: NULL and 'GCV' (generalized cross validation) The former case is that of lambda.selection.criterion='grid' pure evaluation, while the second can be employed for optimization methods. Default value lambda.selection.lossfunction=NULL

lambdaS

A scalar or vector of spatial smoothing parameters.

lambdaT

A scalar or vector of temporal smoothing parameters.

DOF.stochastic.realizations

This parameter is considered only when DOF.evaluation = 'stochastic'. It is a positive integer that represents the number of uniform random variables used in stochastic GCV computation. Default value DOF.stochastic.realizations=100.

DOF.stochastic.seed

This parameter is considered only when DOF.evaluation = 'stochastic'. It is a positive integer that represents user defined seed employed in stochastic GCV computation. Default value DOF.stochastic.seed=0.

DOF.matrix

Matrix of degrees of freedom. This parameter can be used if the DOF.matrix corresponding to lambdaS and lambdaT is available from precedent computation. This allows to save time since the computation of the DOFs is the most expensive part of GCV.

GCV.inflation.factor

Tuning parameter used for the estimation of GCV. Default value GCV.inflation.factor = 1.0. It is advised to set it grather than 1 to avoid overfitting.

lambda.optimization.tolerance

Tolerance parameter, a double between 0 and 1 that fixes how much precision is required by the optimization method: the smaller the parameter, the higher the accuracy. Used only if lambda.selection.criterion='newton' or lambda.selection.criterion='newton_fd', thus ot implemented yet. Default value lambda.optimization.tolerance=0.05.

inference.data.object

An inferenceDataObject that stores all the information regarding inference over the linear parameters of the model. This parameter needs to be consistent with covariates, otherwise will be discarded. If set and well defined, the function will have in output the inference results. It is suggested to create this object via inferenceDataObjectBuilder function, so that the object is guaranteed to be well defined.

Value

A list with the following variables:

  • fit.FEM.timeA FEM.time object that represents the fitted spatio-temporal field.

  • PDEmisfit.FEM.timeA FEM.time object that represents the misfit of the penalized PDE.

  • betaIf covariates is not NULL, a matrix with number of rows equal to the number of covariates and numer of columns equal to length of lambda. The jth column represents the vector of regression coefficients when the smoothing parameter is equal to lambda[j].

  • edfIf GCV is TRUE, a scalar or matrix with the trace of the smoothing matrix for each combination of the smoothing parameters specified in lambdaS and lambdaT.

  • stderrIf GCV is TRUE, a scalar or matrix with the estimate of the standard deviation of the error for each combination of the smoothing parameters specified in lambdaS and lambdaT.

  • GCVIf GCV is TRUE, a scalar or matrix with the value of the GCV criterion for each combination of the smoothing parameters specified in lambdaS and lambdaT.

  • bestlambdaIf GCV is TRUE, a 2-elements vector with the indices of smoothing parameters returnig the lowest GCV

  • ICestimatedIf FLAG_PARABOLIC is TRUE and IC is NULL, a list containing a FEM object with the initial conditions, the value of the smoothing parameter lambda returning the lowest GCV and, in presence of covariates, the estimated beta coefficients

  • bary.locationsA barycenter information of the given locations if the locations are not mesh nodes.

  • inferenceA list set only if a well defined [inferenceDataObject] is passed as parameter to the function; contains all inference outputs required:

    p_valueslist of lists set only if at least one p-value is required; contains the p-values divided by implementation:

    waldlist containing all the Wald p-values required, in the same order of the type list in inference.data.object. If one-at-the-time tests are required, the corresponding item is a vector of p values ordered as the rows of coeff matrix in inference.data.object.

    speckmanlist containing all the Speckman p-values required, in the same order of the type list in inference.data.object. If one-at-the-time tests are required, the corresponding item is a vector of p values ordered as the rows of coeff matrix in inference.data.object.

    eigen_sign_fliplist containing all the Eigen-Sign-Flip p-values required, in the same order of the type list in inference.data.object. If one-at-the-time tests are required, the corresponding item is a vector of p values ordered as the rows of coeff matrix in inference.data.object.

    CIlist of lists set only if at least one confidence interval is required; contains the confidence intervals divided by implementation:

    waldlist containing all the Wald confidence intervals required, in the same order of the type list in inference.data.object. Each item is a matrix with 3 columns and p rows, p being the number of rows of coeff matrix in inference.data.object; each row is the CI for the corresponding row of coeff matrix.

    speckmanlist containing all the Speckman confidence intervals required, in the same order of the type list in inference.data.object. Each item is a matrix with 3 columns and p rows, p being the number of rows of coeff matrix in inference.data.object; each row is the CI for the corresponding row of coeff matrix.

References

#' @references Arnone, E., Azzimonti, L., Nobile, F., & Sangalli, L. M. (2019). Modeling spatially dependent functional data via regression with differential regularization. Journal of Multivariate Analysis, 170, 275-295. Bernardi, M. S., Sangalli, L. M., Mazza, G., & Ramsay, J. O. (2017). A penalized regression model for spatial functional data with application to the analysis of the production of waste in Venice province. Stochastic Environmental Research and Risk Assessment, 31(1), 23-38. Federico Ferraccioli, Laura M. Sangalli & Livio Finos (2021). Some first inferential tools for spatial regression with differential regularization. Journal of Multivariate Analysis, to appear

Examples

library(fdaPDE)

data(horseshoe2D)
boundary_nodes = horseshoe2D$boundary_nodes
boundary_segments = horseshoe2D$boundary_segments
locations = horseshoe2D$locations
time_locations = seq(0,1,length.out = 5)

mesh = create.mesh.2D(nodes = rbind(boundary_nodes, locations), segments = boundary_segments)

space_time_locations = cbind(rep(time_locations,each=nrow(mesh$nodes)),
                             rep(mesh$nodes[,1],5),rep(mesh$nodes[,2],5))

FEMbasis = create.FEM.basis(mesh)
lambdaS = 10^-1
lambdaT = 10^-1
data = fs.test(space_time_locations[,2], 
               space_time_locations[,3])*cos(pi*space_time_locations[,1]) +
       rnorm(nrow(space_time_locations), sd = 0.5)
data = matrix(data, nrow = nrow(mesh$nodes), ncol = length(time_locations), byrow = TRUE)

solution = smooth.FEM.time(observations = data, time_locations = time_locations,
                           FEMbasis = FEMbasis, lambdaS = lambdaS, lambdaT = lambdaT)
plot(solution$fit.FEM)

fdaPDE documentation built on March 7, 2023, 5:28 p.m.