# spatpca: Regularized PCA for spatial data In SpatPCA: Regularized Principal Component Analysis for Spatial Data

## Description

Produce spatial dominant patterns and spatial predictions at the designated locations according to the specified tuning parameters or the selected tuning parameters by the M-fold cross-validation.

## Usage

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 spatpca( x, Y, M = 5, K = NULL, is_K_selected = ifelse(is.null(K), TRUE, FALSE), tau1 = NULL, tau2 = NULL, gamma = NULL, is_Y_detrended = FALSE, maxit = 100, thr = 1e-04, num_cores = NULL ) 

## Arguments

 x Location matrix (p \times d). Each row is a location. d is the dimension of locations Y Data matrix (n \times p) stores the values at p locations with sample size n. M Optional number of folds for cross validation; default is 5. K Optional user-supplied number of eigenfunctions; default is NULL. If K is NULL or is_K_selected is TRUE, K is selected automatically. is_K_selected If TRUE, K is selected automatically; otherwise, is_K_selected is set to be user-supplied K. Default depends on user-supplied K. tau1 Optional user-supplied numeric vector of a nonnegative smoothness parameter sequence. If NULL, 10 tau1 values in a range are used. tau2 Optional user-supplied numeric vector of a nonnegative sparseness parameter sequence. If NULL, none of tau2 is used. gamma Optional user-supplied numeric vector of a nonnegative tuning parameter sequence. If NULL, 10 values in a range are used. is_Y_detrended If TRUE, center the columns of Y. Default is FALSE. maxit Maximum number of iterations. Default value is 100. thr Threshold for convergence. Default value is 10^{-4}. num_cores Number of cores used to parallel computing. Default value is NULL (See RcppParallel::defaultNumThreads())

## Details

An ADMM form of the proposed objective function is written as

\min_{\mathbf{Φ}} \|\mathbf{Y}-\mathbf{Y}\mathbf{Φ}\mathbf{Φ}'\|^2_F +τ_1\mbox{tr}(\mathbf{Φ}^T\mathbf{Ω}\mathbf{Φ})+τ_2∑_{k=1}^K∑_{j=1}^p |φ_{jk}|,

\mbox{subject to $\mathbf{Φ}^T\mathbf{Φ}=\mathbf{I}_K$,} where \mathbf{Y} is a data matrix, {\mathbf{Ω}} is a smoothness matrix, and \mathbf{Φ}=\{φ_{jk}\}.

## Value

A list of objects including

 eigenfn Estimated eigenfunctions at the new locations, x_new. selected_K Selected K based on CV. Execute the algorithm when is_K_selected is TRUE. selected_tau1 Selected tau1. selected_tau2 Selected tau2. selected_gamma Selected gamma. cv_score_tau1 cv scores for tau1. cv_score_tau2 cv scores for tau2. cv_score_gamma cv scores for gamma. tau1 Sequence of tau1-values used in the process. tau2 Sequence of tau2-values used in the process. gamma Sequence of gamma-values used in the process. detrended_Y If is_Y_detrended is TRUE, detrended_Y means Y is detrended; else, detrended_Y is equal to Y. scaled_x Input location matrix. Only scale when it is one-dimensional

## Author(s)

Wen-Ting Wang and Hsin-Cheng Huang

## References

Wang, W.-T. and Huang, H.-C. (2017). Regularized principal component analysis for spatial data. Journal of Computational and Graphical Statistics 26 14-25.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 # The following examples only use two threads for parallel computing. ## 1D: regular locations x_1D <- as.matrix(seq(-5, 5, length = 50)) Phi_1D <- exp(-x_1D^2) / norm(exp(-x_1D^2), "F") set.seed(1234) Y_1D <- rnorm(n = 100, sd = 3) %*% t(Phi_1D) + matrix(rnorm(n = 100 * 50), 100, 50) cv_1D <- spatpca(x = x_1D, Y = Y_1D, num_cores = 2) plot(x_1D, cv_1D$eigenfn[, 1], type = "l", main = "1st eigenfunction") lines(x_1D, svd(Y_1D)$v[, 1], col = "red") legend("topleft", c("SpatPCA", "PCA"), lty = 1:1, col = 1:2) ## 2D: Daily 8-hour ozone averages for sites in the Midwest (USA) library(fields) library(pracma) library(maps) data(ozone2) x <- ozone2$lon.lat Y <- ozone2$y date <- as.Date(ozone2$date, format = "%y%m%d") rmna <- !colSums(is.na(Y)) YY <- matrix(Y[, rmna], nrow = nrow(Y)) YY <- detrend(YY, "linear") xx <- x[rmna, ] cv <- spatpca(x = xx, Y = YY) quilt.plot(xx, cv$eigenfn[, 1]) map("state", xlim = range(xx[, 1]), ylim = range(xx[, 2]), add = TRUE) map.text("state", xlim = range(xx[, 1]), ylim = range(xx[, 2]), cex = 2, add = TRUE) plot(date, YY %*% cv$eigenfn[, 1], type = "l", ylab = "1st Principal Component") ### new loactions new_p <- 200 x_lon <- seq(min(xx[, 1]), max(xx[, 1]), length = new_p) x_lat <- seq(min(xx[, 2]), max(xx[, 2]), length = new_p) xx_new <- as.matrix(expand.grid(x = x_lon, y = x_lat)) eof <- spatpca(x = xx, Y = YY, K = cv$selected_K, tau1 = cv$selected_tau1, tau2 = cv$selected_tau2) predicted_eof <- predictEigenfunction(eof, xx_new) quilt.plot(xx_new, predicted_eof[,1], nx = new_p, ny = new_p, xlab = "lon.", ylab = "lat.") map("state", xlim = range(x_lon), ylim = range(x_lat), add = TRUE) map.text("state", xlim = range(x_lon), ylim = range(x_lat), cex = 2, add = TRUE) ## 3D: regular locations p <- 10 x <- y <- z <- as.matrix(seq(-5, 5, length = p)) d <- expand.grid(x, y, z) Phi_3D <- rowSums(exp(-d^2)) / norm(as.matrix(rowSums(exp(-d^2))), "F") Y_3D <- rnorm(n = 100, sd = 3) %*% t(Phi_3D) + matrix(rnorm(n = 100 * p^3), 100, p^3) cv_3D <- spatpca(x = d, Y = Y_3D, tau2 = seq(0, 1000, length = 10)) library(plot3D) library(RColorBrewer) cols <- colorRampPalette(brewer.pal(9, "Blues"))(p) isosurf3D(x, y, z, colvar = array(cv_3D$eigenfn[, 1], c(p, p, p)), level= seq(min(cv_3D$eigenfn[, 1]), max(cv_3D\$eigenfn[, 1]), length = p), ticktype = "detailed", colkey = list(side = 1), col = cols)