R/meanfield.R

Defines functions pscl_apply pscl_analyze

Documented in pscl_analyze pscl_apply

#### Functions for producing the mean field.

### The functions in this file implement a simple pattern scaling algorithm for
### producing the mean field as a function of temperature.  You don't have to
### use this method for producing the mean field; you can substitute any
### algorithm for estimating the mean temperature response.
###
### Whatever you use, it is essential that the algorithm not \emph{overfit} the
### data.  The variability functions assume that the ESM data is a mixture of
### systematic variation and random variation, and that the mean field algorithm
### captures only the systematic variation.  If the mean field algorithm fits
### the noise in the data, then the variability functions are going to produce
### fields that look too much like the input ESM data.
###
### The pattern scaling used here doesn't have very many degrees of freedom, so
### there isn't much danger of overfitting, but if you use a more
### sophisticated model, then you really should train it on several ensemble
### members and hold back some data for cross-validation.
###
### Finally, there is the matter of interannual variability in the global mean.
### If you fit a mean field using the \emph{actual} global mean temperatures
### from the ESM, then the variability functions aren't going to produce a lot
### of variation in the global means.  That is, the independent runs will look
### very different spatially, but their time series of global means will all be
### very close to the same.
###
### There are (at least) two schools of thought on this.  One is that this
### phenomenon is an type of overfitting.  We know that different ESM runs under
### identical forcing scenarios will produce different time series of global
### mean temperature, so our input time series should reflect only the
### systematic change for the scenario, leaving the random change to be captured
### by the variability functions.
###
### The other school of thought holds that what we are trying to represent is
### the spatial behavior of the temperature field as a function of global mean
### temperature.  In this view, the ideal mean field algorithm would always
### reproduce exactly the global mean it was generated from, and the spatial
### variability would always have exactly zero global mean.  One would then
### produce interannual variability in the global mean through some separate
### generating process.
###
### Both of these approaches are supported by these tools.  The variation in the
### global mean is bundled entirely into the first principal component of the
### variability analysis.  The first school of thought views this as just
### another component that can be analyzed for temporal autocorrelation and
### randomized.  The second school of thought views this component as special,
### inasmuch as it does not capture the variability of the system; therefore,
### variability obtained from a separate analysis must be explicitly added to
### this component.

#' Run pattern scaling analysis on the input temperature fields
#'
#' Run pattern scaling on the input temperature fields.  This amounts to doing a
#' linear regression on each grid cell, with the global temperature as the sole
#' regressor.
#'
#' The return value is a list with the residuals and pattern coefficients as r
#' (resid), w (slope), and b (intercept).
#'
#' dim(r) == dim(tin) == ntime x nlat*nlon
#' dim(w) == dim(b) == nlat*nlon
#'
#' @param tin Matrix of temperature fields (ntime x ngrid)
#' @param tgavin Matrix of global mean temperature over time (ntime x 1)
#' @return List of r, w, and b matrices (see details)
#' @importFrom stats lm
#' @export
#' @keywords internal
pscl_analyze <- function(tin, tgavin)
{

    pscl <- lm(as.matrix(tin) ~ as.matrix(tgavin))

    list(r=as.matrix(pscl$residuals), b=as.matrix(pscl$coefficients[1,]),
         w=as.matrix(pscl$coefficients[2,]))
}


#' Generate a mean field from a pattern and an input temperature series
#'
#' Apply the w and b values to the input global mean temperatures.  Thus, for
#' each grid cell \eqn{i}, \deqn{T_i = w_i T_g + b_i}.  The residuals are
#' \emph{not} added back to the result.
#'
#' The input temperatures should be in the form of a column vector; i.e., a
#' matrix with dimension N x 1.  The output will be a matrix of dimension N x
#' ngrid, where each row is the flattened vector of grid cells for a single time
#' slice.
#'
#' For purposes of applying the pattern, any value of N will work; however, many
#' of the other functions in this package expect a matrix with dimension ntime x
#' ngrid, where ntime is the number of time slices in the ESM dataset originally
#' used in the analysis.  Therefore, for any output that is intended to be used
#' in these other functions, the input temperatures should be of length ntime.
#'
#' @param pscl A scaling pattern generated by \code{\link{pscl_analyze}}, or the
#' \code{pscl} member of a \code{\link[=train]{fldgen}} structure.
#' @param tgav A [N x 1] matrix of global mean temperatures.
#' @export
#' @keywords internal
pscl_apply<- function(pscl, tgav)
{
    tscl <- tgav %*% t(pscl$w)
    ntime <- nrow(tscl)

    ## Add the b values to each time slice.
    bmat <- matrix(rep(pscl$b, ntime), nrow=ntime, byrow=TRUE)

    tscl + bmat
}
JGCRI/fldgen documentation built on July 18, 2020, 1:42 p.m.