stdf: STDF: Spatio-temporal data fusion model
In guiludwig/stdf: Spatio-Temporal Data Fusion Model

Description Usage Arguments Value Author(s) References See Also Examples

STDF: Spatio-temporal data fusion model

This function is an wrapper to fitting a Spatio-Temporal Data Fusion (STDF) model to spatio-temporal data.

stdf(training.set, subtfpca = NULL, ssensors = 6, L = 2,
  spline.df = NULL, zeta = 0, fpca.df = 20, homogeneous = FALSE,
  matern.nu = 0.5, method = c("L-BFGS-B", "Nelder-Mead"), verbose = TRUE,
  ...)

`training.set`	A numeric matrix with the response variable in the first column, time values in the second column and x, y spatial coordinates in the third column. Corresponds to the training data.
`subtfpca`	Logical vector with the same length as the number of rows in training.set; tells stdf which lines should be included in the calculation of the temporal dependency component. Defaults to NULL, which includes all lines.
`ssensors`	Integer corresponding to how many static sensors in the dataset.
`L`	Number of eigenfunctions in spatio-temporal covariance, defaults to 2.
`spline.df`	Number of spline basis for the deterministic spline component (see `bs` function). Defaults to NULL, which uses 3 spline basis.
`zeta`	Smoothness parameter for the fpca function. Defaults to 0 (no smoothing). See `tfpca` function.
`fpca.df`	Number of spline basis for the stochastic spline component (see `tfpca` function). Defaults to 20.
`homogeneous`	Whether the variance of static and roving sensors is assumed to be the same or not. Defaults to FALSE.
`method`	Either "L-BFGS-B" or "Nelder-Mead", which are passed to the optimization function `constrOptim`.
`verbose`	Whether `stdf` prints a message when the optimization step is done. Defaults to `TRUE`.
`...`	further arguments passed to or from other methods.

stdf returns an object of class "stdf" containing at least the following components

`beta.est`	Coefficient estimates
`MSPEKrig`	Mean squared Kriging Prediction error
`spatCov`	Theta parameters for stochastic term
`sigma2s`	Static sensor variance estimate
`sigma2r`	Roving sensor variance estimate

Guilherme Ludwig and Tingjin Chu

Chu, T., Zhu, J. and Wang, H. (2014) On Semiparametric Inference of Geostatistical Models via Local Karhunen-Loeve Expansion. Journal of the Royal Statistical Society, 76, 817-832.

Ludwig, G., Chu, T., Zhu, J., Wang, H. and Koehler, K. (2015) Static and Roving Sensor Data Fusion for Spatio-Temporal Mapping with Application to Occupational Exposure Assessment, to appear.

tfpca

# sensor simulation example, placeholder for unit tests but also shows
# options available for optimization:
set.seed(1)
static <- cbind(Y = rep(0, 200), 
                t = rep(1:50,4), 
                sx = rep(c(2,2,4,4), each = 50), 
                sy = rep(c(2,4,2,4), each = 50))
roving <- cbind(Y = rep(0, 50), 
                t = 1:50, 
                sx = seq(2,4, length=50), 
                sy = seq(2,4, length=50))
prediction <- cbind(Y = rep(0, 50), t = 1:50, sx = 3, sy = 3)
complete <- rbind(static, roving, prediction)
D <- matrix(0, 300, 300)
for(i in 1:300) D[i,] <- sqrt((complete[i,3] - complete[,3])^2 + 
                              (complete[i,4] - complete[,4])^2)
S <- matrix(0, 300, 300)
Phi <- cbind(sin(2*pi*complete[,2]/50), 
             cos(2*pi*complete[,2]/50), 
             2*sin(4*pi*complete[,2]/50))
sigmaR <- 2
sigmaS <- 1
theta <- 1:3
for(L in 1:3) S <- S + exp(-D/theta[L])*outer(Phi[,L], Phi[,L])
diag(S) <- diag(S) + sigmaS
diag(S)[201:250] <- diag(S)[201:250] - sigmaS + sigmaR
complete[,1] <- 2 + 2*complete[,3] + 2*complete[,4] + 
                2*cos(4*pi*complete[,2]/50) + 
                t(chol(S)) %*% rnorm(300)
static[,1] <- complete[1:200,1]
roving[,1] <- complete[201:250,1]
prediction[,1] <- complete[251:300,1]

results <- stdf(static, ssensors = 4, L = 3)
resultsNM <- stdf(static, ssensors = 4, L = 3, 
                  method = "Nelder-Mead")
# Compare results of using "L-BFGS-B" and "Nelder-Mead";
# Should be almost the same, bar numberical accuracy issues
results
resultsNM
# How to predict, and some numerical accuracy difference
test.A <- predict(results, prediction)
test.B <- predict(resultsNM, prediction)
all.equal(test.A$fitted, test.B$fitted)
# Compare with fitting homogeneous case
resultsH <- stdf(static, ssensors = 4, L = 3, 
                 homogeneous = TRUE)
resultsHNM <- stdf(static, ssensors = 4, L = 3, 
                   homogeneous = TRUE, method = "Nelder-Mead")
                   
# Nelder-Mead is more time-consuming, for example in this case:
system.time({resultsPlusRoving <- stdf(rbind(static, roving), 
                          subtfpca = c(rep(TRUE,200), rep(FALSE,50)), 
                          ssensors = 4, L = 3)})
system.time({resultsPlusRovingNM <- stdf(rbind(static, roving), 
                            subtfpca = c(rep(TRUE,200), rep(FALSE,50)), 
                            ssensors = 4, L = 3, method = "Nelder-Mead")})
# Results should be almost the same, bar numerical accuracy issues
resultsPlusRoving
resultsPlusRovingNM
# Forces homogeneous model:                         
resultsPlusRovingH <- stdf(rbind(static, roving), 
                          subtfpca = c(rep(TRUE,200), rep(FALSE,50)), 
                          ssensors = 4, L = 3, homogeneous = TRUE)

# Example with dataset from fda package; this dataset is
# huge and takes a long time to run.
## Not run: 
library(fda)
n <- 35 # stations
Y <- as.numeric(CanadianWeather$dailyAv[ , ,"Temperature.C"])
XY <- cbind(Y, 
            rep(1:365, n),  
            rep(CanadianWeather$coordinates[, "N.latitude"], each = 365), 
            rep(CanadianWeather$coordinates[, "W.longitude"], each = 365))
fakePred <- XY[1:3,]
model <- stdf(XY, ssensors = n, homogeneous = TRUE,
              L = 2, zeta = 1e5)
# Creating a hazard map requires the ggplot2 package
# TODO

## End(Not run)