# z_A5._Estimate_the_model_: Spatial smoothing with unknown changes in the pattern of... In scpm: An R Package for Spatial Smoothing

## Description

Fit a spatial semiparametric model based on splines including unknown changes in the pattern of response.

## Usage

 1 2 3 4 scp(formula, data, initial = NULL, contrasts = NULL, model = "exponential", fix.nugget = FALSE, fix.kappa = FALSE, nugget.tol = 1e-15, angle = 0, ratio = 1, use.reml = FALSE, use.profile = TRUE, chMaxiter = 20, control = list()) 

## Arguments

 formula formula. An expression to specify the model to fit. See 2. Mean model. data sss class object. A dataset object generated by any of the commands as.sss or create.sss. initial named list. The starting values for the covariance parameters of the model. If initial=NULL then it is used an internal grid search to define the starting values. contrasts character. A contrast method for factor covariates. Default to contr.treatment. model character. Name of the semivariogram model to estimate for the spatial dependence. See Semivariogram Model. fix.nugget logical or numeric. If FALSE the nugget τ^2 is estimated. If fix.nugget is a numeric value then the nugget τ^2 is set to the value defined for fix.nugget. fix.kappa logical or numeric vector. If FALSE the parameters κ are estimated. If fix.kappa is a numeric vector then κ is set to the values of the vector defined for fix.kappa. nugget.tol numeric. Threshold for microscale spatial variations to define the nugget effect. Default to 1.0e-15. Do not modify unless know what is being doing. angle numeric. Angle for geometric anisotropy. Note that this overwrites any specification for aniso.angle in s2D. ratio numeric. Ratio between [0,1] for geometric anisotropy. Note that this overwrites any specification for aniso.ratio in s2D. use.reml logical. For using REML estimation set to TRUE, for ML estimation set to FALSE (default). use.profile logical. For profiling set to TRUE (default). chMaxiter integer. Maximum number of iterations for the loop estimating changes in the pattern of response. control named list. Options to control the optimization. See argument control in command optim.

## 1. Semiparametric model

Assume that the response variable admit the trend surface model

Y(s) = a^T b + g(s) + ε(s)

where a is a known vector of covariates and b their coefficients; g(s) is a deterministic bivariate spline and ε(s) is a Gaussian spatial process (GSP) with mean zero and covariance depending only on the distance h and given by Cov(ε(s+h),ε(s)). This model is also called a trend surface model. Given n observed locations s_1,…,s_n\in S\subset\Re^2 in a two-dimensional space, then the model is

Y = Ab + g + ε

where Y = (Y(s_1), …, Y(s_n))^T, A is the known matrix of covariates, g = (g(s_1),…,g(s_n))^T and ε = (ε(s_1),…,ε(s_n))^T. The covariance matrix is given by Cov(ε,ε) = Σ = σ^2 R with R a valid correlation matrix. Thus Y\sim N_n(μ,Σ) where μ = Ab + g and the likelihood function is L(b,g,σ^2,θ;Y) = (2π)^{-n/2}|Σ|^{-1/2}\exp\{-\frac{1}{2}(Y-μ)^TΣ^{-1}(Y-μ)\} with θ the parameters that define the correlation matrix R.

## 2. Mean model

It can be defined by the commands:

linear

that defines the covariates in the matrix A. Note that more than one linear command can be defined. See linear.

cp

defines changes in the pattern of response by including the covariates (z_d - ψ^{(0)}_d)\times 1\{z_d>ψ^{(0)}_d\} and -1\{z_d>ψ^{(0)}_d\} for d=1,…,G into the matrix A. Note that more than one cp command can be defined. See cp.

s2D

define the bivariate spline g. Note that only one s2D command can be defined. See s2D.

## 3. Covariance model and nugget effect

Given a distance h define u = ||T^{1/2}_{\texttt{\tiny angle, ratio}}h|| = (h^T T_{\texttt{\tiny angle,ratio}} h)^{1/2}\in\Re where T_{\texttt{\tiny angle, ratio}} is a rotation matrix for geometric anisotropy. The errors are given by the process ε(s) = η(s) + ξ(s) where ξ is a GSP with mean zero and covariance

\begin{array}{ll}Cov(ξ(s),ξ(s+h)) & = C_ξ(u;φ,κ) \\ & = σ^2_0ρ_ξ(u;φ,κ)\end{array}

with ρ_ξ(u;φ,κ) the correlation function; and η is a nugget effect with covariance

\begin{array}{ll}Cov(η(s),η(s+h)) & = C_η(u;τ^2,\texttt{tol.nugget}) \\ & = τ^2ρ_η(u;\texttt{tol.nugget})\end{array}

with correlation function ρ_η(u;\texttt{tol.nugget}) = 1\{u<\texttt{tol.nugget}\}. Therefore the covariance of the process ε is given by

\begin{array}{ll} Cov(ε(s),ε(s+h)) & = C_ε(u;σ^2,θ,\texttt{tol.nugget}) \\ & = σ^2 ρ_ε(u;θ,\texttt{tol.nugget})\end{array}

with correlation function given by

ρ_ε(u;θ,\texttt{tol.nugget}) = (1-ρ_*)ρ_η(u;\texttt{tol.nugget}) + ρ_*ρ_ξ(u;φ,κ)

where θ = (ρ_*,φ,κ)^T are the parameters with ρ_* = σ^2_0/σ^2, σ^2 = τ^2 + σ^2_0, and tol.nugget is the argument that controls the largest distance at which micro-scale variations can affect the observed outcome. By default tol.nugget is set to 1.0e-15. The parameters φ,κ define the correlation function of the process ξ with φ usually called the range parameter and κ depending on the model selected. The semivariogram can be expressed as

γ_ε(u;σ^2,θ,\texttt{tol.nugget}) = σ^2(1-ρ_ε(u;θ,\texttt{tol.nugget}))

where τ^2 is the nugget effect, σ^2 is the sill, and σ^2_0 is the partial sill. Note that when angle = 0 and ratio = 1 the matrix T_{\texttt{\tiny angle,ratio}} is an identity matrix and u = h so the correlation ρ_ε(u;θ,\texttt{tol.nugget}) is isotropic. Use different values for angle and ratio to define a geometric anisotropic correlation function. Then the covariance matrix Σ = σ^2 R where R is the correlation matrix originated from ρ_ε(u;θ,\texttt{tol.nugget}). It is possible to define the argument model=name where name is one of the following: matern, powered.exponential, spherical, wave, exponential, gaussian, cubic, circular, gencauchy, cauchy, RMmatern, RMwhittle, RMgneiting, and RMnugget. For .semiVar one of matern, gaussian, exponential, power, cubic, penta.spherical, spherical, wave, sin.hole, pure.nugget and identity. By default the covariance model is set to exponential with angle=0 and ratio=1.

## 4. Penalized maximum likelihood estimation

Estimation can be performed by maximisation with respect to b,g,σ^2,θ, and α of the penalized log likelihood

\ell_p(b,g,σ^2,θ,α) = \log(L(b,g,σ^2,θ;Y)) - \frac{1}{2σ^2} J_α(g)

where J_α(g) = g^T Q_α g is the penalty and Q_α is the roughness matrix.

## 5. Penalties

Depending on the type of spline assumed for g the penalty is defined differently depending on the roughness matrix Q_α which is given by:

Tensor product spline.

Given τ_{1,1},…,τ_{1,K_1} and τ_{2,1},…,τ_{2,K_2} the design points in each coordinate then

Q_α = α_1 I_{K_2}\otimes Q_1 + α_2 Q_2\otimes I_{K_1}

where Q_1, Q_2 are unidimensional roughness matrices from the design points in each coordinate and α_1, α_2 are smoothing parameters in each direction.

Thin plate spline.

Given the n locations, Q_α = α E where α is the smoothing parameter and the n\times n matrix E has elements E_{i,j} = \vartheta(||s_i - s_j||) for i,j=1,…,n where

\vartheta(u) = ≤ft\{\begin{array}{ll}\frac{1}{16π}\times u^2\log(u^2) & \textrm{ , $u>0$} \\ 0 & \textrm{ , otherwise.}\end{array}\right.

## 6. Mixed model representation

The spline can be written as g = Xβ + Zr with β and r the coefficients and X and Z design matrices conveniently defined. Then for the observed responses the model can be expressed as a the mixed model

Y = Ab + Xβ + Zr + ε

where r \sim Normal(0,I_V) with V the number of columns in Z. Then, Y\sim N_n(μ_m, Σ) where μ_m = Ab + Xβ and Σ = σ^2 R; and Y|r \sim N_n(μ,V) where μ = Ab + Xβ + Zr and V = ZZ^T + Σ. Let us denote \vartheta = (b,β,σ^2,θ,α)^T, then the conditional log-likelihood of the model is given by

\ell(\vartheta|r) \propto -\frac{1}{2}≤ft\{\log|Σ|+(Y-μ)^TΣ^{-1}(Y-μ)\right\}

and the marginal log-likelihood is given by

\ell(\vartheta) \propto -\frac{1}{2}≤ft\{\log|V|+(Y-Ab-Xβ)^T V^{-1}(Y-Ab-Xβ)\right\}.

## Author(s)

Mario A. Martinez Araya, r@marioma.me

scpm documentation built on Feb. 17, 2020, 5:08 p.m.