sim.Krig: Unconditional and conditional simulation of a spatial process
In fields: Tools for Spatial Data

sim.spatialProcess

R Documentation

Unconditional and conditional simulation of a spatial process

Description

Generates exact (or approximate) random draws from the unconditional or conditional distribution of a spatial process given specific observations. Draws from the conditional distribution, known as conditional simulation in geostatistics, is a useful way to characterize the uncertainty in the predicted process from data. Note that exact simulation is limted by the number of locations but there are approximate strategies to handle simulation for large grids of locations.

Usage


simSpatialData(object,  M = 1, verbose = FALSE)
sim.spatialProcess(object, xp,  M = 1, verbose = FALSE, ...)
sim.Krig(object, xp, M = 1, verbose = FALSE, ...)

simLocal.spatialProcess(
mKrigObject, gridList = NULL, M = 1, nx = 80, ny = 80,
                 delta = NULL, fast = FALSE, NNSize = 4, NNSizePredict
                 = 4, giveWarnings = FALSE, verbose = FALSE, ...)

Arguments

`delta`	If the covariance has compact support the simulation method can take advantage of this. This is the amount of buffer added for the simulation domain in the circulant embedding method. A minimum size would be `aRange` for the Wendland but a multiple of this maybe needed to obtain a positive definite circulant covariance function.
`fast`	If TRUE will use approximate, fast spatial prediction on grids.
`giveWarnings`	If true will warn when more than one observation is in a grid box. This is instead of giving an error and stopping.
`gridList`	A grid list that is the grid for predictions. If NULL (default) a grid will be creatd based on the location ranges and `nx` and `ny`.
`mKrigObject`	An mKrig Object (or spatialProcess object)
`M`	Number of draws from conditional distribution.
`NNSize`	Degree of nearest neighbors to use. In the 2D case, the default `NNSize=2` uses 16 points in a 4X4 grid for prediction of the off grid point. This is for both the rapid prediction and the local simulation. ( See `predictSurface.mKrig` with `fast= TRUE`. )
`NNSizePredict`	Same definition of nearest neighbor size but applied to the rapid prediction approximation for the covariance. (See `approximateCovariance2D`.)
`nx`	Number of grid points in prediction locations for x coordinate.
`ny`	Number of grid points in prediction locations for y coordinate.
`object`	The spatial fit object.
`xp`	Same as predictionPoints above.
`verbose`	If true prints out intermediate information for debugging.
`...`	Any other arguments to be passed to the predict function. Usually this is the `Z` or `drop.Z` argument when there are additional covariates in the fixed part of the model. (See example below.)

Details

These functions generate samples from an unconditional or conditional multivariate (spatial) distribution, or an approximate one. The unconditional simulation function, simSpatialData, is a handy way to generate synthetic observations from a fitted model. Typically one would use these for a parametric bootstrap. The functions that simulate conditional distributions are much more involved in their coding. They are useful for describing the uncertainty in predictions using the estimated spatial process under Gaussian assumptions. An important assumption throughout these functions is that all covariance parameters are fixed at their estimated or prescribed values from the passed object. Although these functions might be coded up easily by the users these versions have the advantage that they take the mKrig, spatialProcess or Krig objects as a way to specify the model in an unambiguous way.

Given a spatial process h(x)= P(x) + g(x) observed at

Y.k = Z(x.k)d + P(x.k) + g(x.k) + e.k

where P(x) is a low order, fixed polynomial and g(x) a Gaussian spatial process and Z(x.k) is a vector of covariates that are also indexed by space (such as elevation). Z(x.k)d is a linear combination of the covariates with the parameter vector d being a component of the fixed part of the model and estimated in the usual way by generalized least squares.

With Y= Y.1, ..., Y.N, the goal is to sample the conditional distribution of the process.

[h(x) | Y ] or the full prediction Z(x)d + h(x)

For fixed a covariance this is just a multivariate normal sampling problem. sim.spatialProcess samples this conditional process at the points xp and is exact for fixed covariance parameters.

The outline of the conditional simulation algorithm is given below and has the advantage that is only depends on the unconditional simulation of the spatial process and being able to make a spatial prediction - the conditional mean of the process given the observations and covariance function.

Conditional Simulation Algorithm:

0) Find the spatial prediction at the unobserved locations based on the actual data. Call this h.hat(x) and this is also the conditional mean.

1) Generate a realization that includes both prediction and observation locations from the unconditional spatial process and from this process simluate synthetic observations.

2) Use the spatial prediction model ( using the true covariance) to estimate the spatial process at unobserved locations.

3) Find the difference between the simulated process and its prediction based on synthetic observations. Call this e(x).

4) h.hat(x) + e(x) is a draw from [h(x) | Y ].

The approximations for this simulation come in at step 1). Here the field at the observation locations is approximated using a local conditional simulation from the nearest grid points.

NOTE: A fixed part in the model is handled easily by simply making the prediction from the synthetic observations that have mean zero but include estimation of the fixed part as part of the prediction. Because the regression estimates are unbaised, this gives a valid draw from the correct mulitvariate distribution even though the synthetic observations do not include a fixed part.

sim.spatialProcess Follows this algorithm exactly. For the case of an addtional covariate this of course needs to be included. For a model with covariates use drop.Z=TRUE for the function to ignore prediction using the covariate and generate conditional samples for just the spatial process and any low order polynomial. Finally, it should be noted that this function will also work with an mKrig object because the essential prediction information in the mKrig and spatialProcess objects are the same. The naming is through convenience.

sim.Krig Also follows this algorithm exactly but for the older Krig object. Note that the inclusion of drop.Z=TRUE or FALSE will determine whether the conditional simulation includes the covariates Z or not. (See example below.)

simLocal.spatialProcess This function is designed for conditional simulation for a large prediction grid and for a large number of observation s. The approximation will be accurate for fine grids that separate clusters of observation locations. E.g. multiple observations in a single gridbox are treated exactly. If observation location are separated by a grid box then due to the screening effect the approximation error will be negliable, especially if a nugget component (tau ) is present.

A 1D version of this function will not be much more efficient than an exact computation so it has not been included.

See the utility function offGridWeights2D for the function that creates weights used to generate the (approximate) conditional sample. The functions checkPredictGrid, makePredictionGridList and are utilities to setup the grids when not specified.

Value

sim.Krig and sim.spatialProcess a matrix with rows indexed by the locations in xp and columns being the M independent draws.

simLocal.spatialProcess a list with components x, y and z being the simulations at the prediction grid. The x and y are the typical image format specifying a regular grid. If nx and ny are the lengths of the grid values in X and Y then z is a array with dimension c(nx, ny, M). For the 1D case ny is set to 1. The component hHat is the conditional mean ( as an nx X ny matrix) and the remaining arguments are various timing results for parts of the computation.

Author(s)

Doug Nychka

Examples

## 10 member ensemble for the O3 data

## Not run: 
data( "ozone2")
fitObject<- spatialProcess( ozone2$lon.lat, ozone2$y[16,],
                              smoothness=.5)

nx<- 35
ny<- 25

xGridList<- fields.x.to.grid( fitObject$x, nx=nx, ny=ny)

xGrid<- make.surface.grid( xGridList)

allTime0<- system.time(
 look0<- sim.spatialProcess(fitObject, xp=xGrid, M=5)
)
print( allTime0)

# for M=5 this does not make much sense ... however here are the 
# Monte Carlo based prediction standard deviations. 

predictSE<- apply( look0, 1, sd)

# compare to  predictSE(fitObject, xp=xGrid)

## Local simulation and same grid size for prediction 
## this runs much faster compared to the exact method above 
## as nx, ny are increased  e.g. nx= 128, ny=128 has dramatic difference 

allTime1<- system.time(
  look<- simLocal.spatialProcess(fitObject, M=5,nx=nx, ny=ny,
                     
                     NNSize=5)
) 
print( allTime1)
print( look$timing)

allTime2<- system.time(
  look<- simLocal.spatialProcess(fitObject, M=5,nx=nx, ny=ny,
                     NNSize=5, 
                     NNSizePredict=6,
                     fast=TRUE)
) 
print( allTime2)
print( look$timing)

## End(Not run)

## Not run: 
## A simple example for setting up a bootstrap 
## M below should be
## set to a much larger sample size, however, ( e.g. M <- 200) for better
## statistics

data( ozone2)
obj<- spatialProcess( ozone2$lon.lat,ozone2$y[16,] )
aHat<- obj$summary["aRange"]
lambdaHat<- obj$summary["lambda"]

######### boot strap 
# create M independent copies of the observation vector
# here we just grab the model information from the 
# spatialProcess object above.
#  
# However, one could just create the list 
#  obj<- list( x= ozone2$lon.lat,
#       cov.function.name="stationary.cov",
#     summary= c( tau= 9.47, sigma2= 499.79, aRange= .700),
#    cov.args= list( Covariance="Matern", smoothness=1.0),
#     weights= rep( 1, nrow(ozone2$lon.lat) )
# )
# Here summary component  has the parameters
# tau, sigma2 and aRange
# and cov.args component has the remaining ones.

set.seed(223)
M<- 25
ySynthetic<- simSpatialData( obj, M)

bootSummary<- NULL

for(  k in 1:M){
cat( k, " ")
# here the MLEs are found using the easy top level level wrapper
# see mKrigMLEJoint for a more efficient strategy
  newSummary<- spatialProcess(obj$x,ySynthetic[,k],
                    cov.params.start= list(
			                   aRange = aHat,
			                  lambda = lambdaHat)
                               )$summary
  bootSummary<- rbind( bootSummary, newSummary)
  }
cat( fill= TRUE)
# the results and 95
  stats( bootSummary )

  obj$summary
  tmpBoot<- bootSummary[,c("lambda", "aRange") ]
  confidenceInterval <- apply(tmpBoot, 2,
                               quantile, probs=c(0.025,0.975) )
# compare to estimates used as the "true" parameters			       
  obj$summary[2:5] 
  print( t(confidenceInterval) )
# compare to confidence interval using large sample theory  
  print( obj$CITable)

## End(Not run)

## Not run: 
# conditional simulation with covariates
# colorado climate example
# note how elevation enters either as a vector or image/grid

  data(COmonthlyMet)
  fit1E<- spatialProcess(CO.loc,CO.tmin.MAM.climate, Z=CO.elev   )
# conditional simulation application for filling missing data
  good<- !is.na(CO.tmin.MAM.climate ) 
  infill<- sim.spatialProcess( fit1E, xp=CO.loc[!good,], 
                Z= CO.elev[!good], M= 10)
# conditional simulation on a grid  with covariates.                
# get an elevation grid  ... NGRID<- 50 gives a nicer image but takes longer 
 NGRID <- 25  
 # get elevations on a grid  
   COGrid<- list( x=seq( -109.75, -100.5, ,NGRID), y= seq(36, 41.75,,NGRID) )
   COGridPoints<- make.surface.grid( COGrid)
 # elevations are a bilinear interpolation from the 4km
 # Rocky Mountain elevation fields data set.   
   data( RMelevation)
   # COElevTmp<- interp.surface.grid(RMelevation,COGrid )
   #  check that predicted surface is OK
   #look<- predictSurface( fit1E, gridList=COGrid, Z=COElevTmp )
# elevations as a vector 
  COElevGrid<- interp.surface( RMelevation, COGridPoints )
# NOTE call to sim.spatialProcess treats the grid points as just a matrix
# of locations the plot has to "reshape" these into a grid 
# to use with image.plot 
   SEout<- sim.spatialProcess( fit1E, xp=COGridPoints,  Z= COElevGrid, M= 30)
# for just the smooth surface in lon/lat
#  SEout<- sim.spatialProcess( fit1E, xp=COGridPoints, 
#                            drop.Z=TRUE, M= 30)
# of course,  M should be larger to reduce Monte Carlo error.      
   surSE<- apply( SEout, 1, sd )
   image.plot( as.surface( COGridPoints, surSE)) 
   points( fit1E$x, col="magenta", pch=16) 
   

## End(Not run)

### Approximate conditional simulation 
## Not run: 

 # create larger lon/lat grid  
 # see NNSize and NNSizePredict for additional arguments to control the 
 # size of nearest neighborhoods used. 
 
   NGRID <-200  
   COGrid2<- list( x=seq( -109.75, -100.5, ,NGRID),
                    y= seq(36, 41.75,,NGRID) )
 # interpolation elevations to this grid. 
 
   COElevGrid2<- interp.surface.grid( RMelevation, COGrid2 )
   
   #  check that predicted surface is OK
   # look<- predictSurface.mKrig( fit1E, gridList=COGrid2, ZGrid=COElevGrid2, fast=TRUE )
   
   set.seed(222)
  system.time( 
    SEout0<- simLocal.spatialProcess( fit1E,COGrid2, 
    ZGrid= COElevGrid2$z,
    M= 10)
    )
    
# now with rapid, approximate prediction 
    set.seed(222)
    system.time( 
    SEout1<- simLocal.spatialProcess( fit1E,COGrid2 , 
    ZGrid= COElevGrid2$z,
    M= 50, fast = TRUE)
    )
    
   look <- apply( SEout1$z,c(1,2), sd)
  imagePlot(SEout1$x, SEout1$y, look, col=viridis(256) )
  points( fit1E$x, pch=16, cex=.5, col="magenta")
  title("Monte Carlo prediction SE") 
  

## End(Not run)

#### example using Krig object and exact conditional simulation.
## Not run: 
data( ozone2)
set.seed( 399)
# fit to day 16 from Midwest ozone data set.
  out<- Krig( ozone2$lon.lat, ozone2$y[16,], Covariance="Matern", 
            aRange=1.0,smoothness=1.0, na.rm=TRUE)

# NOTE aRange =1.0 is not the best choice but 

# the six missing data locations
 xp<-  ozone2$lon.lat[ is.na(ozone2$y[16,]),]

# 30 draws from process at xp given the data 

 sim.out<- sim.Krig( out,xp, M=30)


## End(Not run)

fields documentation built on Sept. 9, 2025, 5:39 p.m.