fitGLS: Fit a PARTS GLS model.
In morrowcj/remotePARTS: Spatiotemporal Autoregression Analyses for Large Data Sets

fitGLS

R Documentation

Fit a PARTS GLS model.

Description

Fit a PARTS GLS model.

Usage

fitGLS(
  formula,
  data,
  V,
  nugget = 0,
  formula0 = NULL,
  save.xx = FALSE,
  save.invchol = FALSE,
  logLik.only = FALSE,
  no.F = FALSE,
  coords,
  distm_FUN,
  covar_FUN,
  covar.pars,
  invCholV,
  ncores = NA,
  suppress_compare_warning = FALSE,
  ...
)

Arguments

`formula`	a model formula
`data`	an optional data frame environment in which to search for variables given by `formula`
`V`	a covariance matrix, which must be positive definitive. This argument is optional if `coords`, `distm_FUN`, `covar_FUN`, and `covar.pars` are given instead.
`nugget`	an optional numeric nugget, must be positive
`formula0`	an optional formula for the null model to be compared with `formula` by an F-test
`save.xx`	logical: should information needed for cross-partition comparisons be returned?
`save.invchol`	logical: should the inverse of the Cholesky matrix be returned?
`logLik.only`	logical: should calculations stop after calculating parital log-likelihood?
`no.F`	logical: should F-test calculations be made?
`coords`	optional coordinate matrix for calculating `V` internally
`distm_FUN`	optional function for calculating a distance matrix from `coords`, when calculating `V` internally
`covar_FUN`	optional distance-based covariance function for calculating `V` internally
`covar.pars`	an optional named list of parameters passed to `covar_FUN` when calculating `V` internally
`invCholV`	optional pre-calculated inverse cholesky matrix to use in place of `V`
`ncores`	an optional integer indicating how many CPU threads to use for matrix calculations.
`suppress_compare_warning`	an optional variable to suppress warning that arises from identical `formula` and `formula0`.
`...`	additional arguments passed to `optimize_nugget`, which are only used if if `nugget = NA`

Details

conduct generalized least-squares regression of spatiotemporal trends

fitGLS fits a GLS model, using terms specified in formula. In the PARTS method, generally the left side of formula should be pixel-level trend estimates and the right side should be spatial predictors. The errors of the GLS are correlated according to covariance matrix V.

If nugget = NA, an ML nugget is estimated from the data using the optimize_nugget() function. Arguments additional arguments (...) are passed to optimize_nugget in this case. V must be provided for nugget optimization.

If formula0 is not specified, the default is to fit an intercept-only null model.

save.xx is included to allow for manually conducting a partitioned GLS analyses. Because most users will not need this feature, opting instead to use fitGLS_parition(), save.xx = FALSE by default.

Similarly, save.invchol is included to allow for recycling of the inverse cholesky matrix. Often, inverting the large cholesky matrix (i.e., invert_chol(V)) is the slowest part of GLS. This argument exists to allow users to recycle this process, though no remotePARTS function currently exists that can use invert_chol(V) to fit the GLS.

logLik.only = TRUE will return only the partial log-likelihood, which can minimized to obtain the maximum likelihood for a given set of data.

If no.F = TRUE, then the model given by formula is not compared to the model given by formula0.

If V is not provided, it can be fit internally by specifying all of coords, distm_FUN, covar_FUN, and covar.pars. The function given by distm_FUN will calculate a distance matrix from coords, which is then transformed into a distance-based covariance matrix with covar_FUN and parameters given by covar.pars.

This function uses C++ code that uses the Eigen matrix library (RcppEigen package) to fit models as efficiently as possible. As such, all available CPU cores are used for matrix calculations on systems with OpenMP support.

ncores is passed to the C++ code Eigen::setNpThreads() which sets the number of cores used for compatible Eigen matrix operations (when OpenMP is used).

Value

fitGLS returns a list object of class "remoteGLS", if logLik.only = FALSE. The list contains at least the following elements:

coefficients: coefficient estimates for predictor variables
SSE: sum of squares error
MSE: mean squared error
SE: standard errors
df_t: degrees of freedom for the t-test
logDetV: log-determinant of V
tstat: t-test statistic
pval_t: p-value of the t-statistic
logLik: the Log-likelihood of the model
nugget: the nugget used in fitting
covar_coef: the covariance matrix of the coefficients

If no.F = FALSE, the following elements, corresponding to the null model and F-test are also calculated:

coefficients0: coefficient estimates for the null model
SSE0: sum of squares error for the null model
MSE0: mean squared error for the null model
SE0: the standard errors for null coefficients
MSR: the regression mean square
df0: the null model F-test degrees of freedom
LL0: the log-likelihood of the null model
df_F: the F-test degrees of freedom, for the main model
Fstat: the F-statistic
pval_F: the F-test p-value
formula: the alternate formula used
formula0: the null formula used

An attribute called also set to "no.F" is set to the value of argument no.F, which signals to generic methods how to handle the output.

If save.invchol = TRUE, output also includes

invcholV: the inverse of the Cholesky decomposition of the covariance matrix obtained with invert_chol(V, nugget)

If save.xx = TRUE, output also includes the following elements

xx: the predictor variables X, from the right side of formula, transformed by the inverse cholesky matrix: xx = invcholV %*% X
xx0: the predictor variables X0, from the right side of formula0, transformed by the inverse cholesky matrix: xx0 = invcholV %*% X0

The primary use of xx and xx0 is for use with fitGLS_partition().

If logLik.only = TRUE, a single numeric output containing the log-likelihood is returned.

Examples


## read data
data(ndvi_AK10000)
df = ndvi_AK10000[seq_len(200), ] # first 200 rows

## fit covariance matrix
V = covar_exp(distm_scaled(cbind(df$lng, df$lat)), range = .01)

## run GLS
(GLS = fitGLS(CLS_coef ~ 0 + land, data = df, V = V))

## with F-test calculations to compare with the NULL model
(GLS.F = fitGLS(CLS_coef ~ 0 + land, data = df, V = V, no.F = FALSE))

## find ML nugget
fitGLS(CLS_coef ~ 0 + land, data = df, V = V, no.F = FALSE, nugget = NA)

## calculate V internally
coords = cbind(df$lng, df$lat)
fitGLS(CLS_coef ~ 0 + land, data = df, logLik.only = FALSE, coords = coords,
       distm_FUN = "distm_scaled", covar_FUN = "covar_exp", covar.pars = list(range = .01))

## use inverse cholesky
fitGLS(CLS_coef ~ 0 + land, data = df, invCholV = invert_chol(V))

## save inverse cholesky matrix
invchol = fitGLS(CLS_coef ~ 0 + land, data = df, V = V, save.invchol = TRUE)$invcholV

## re-use inverse cholesky instead of V
fitGLS(CLS_coef ~ 0 + land, data = df, invCholV = invchol)

## Log-likelihood (fast)
fitGLS(CLS_coef ~ 0 + land, data = df, V = V, logLik.only = TRUE)

morrowcj/remotePARTS documentation built on Sept. 17, 2023, 5:42 p.m.