prinComp: Principal Component Analysis of Grids

View source: R/prinComp.R

prinCompR Documentation

Principal Component Analysis of Grids

Description

Performs a Principal Component Analysis of grids, multigrids or multimember multigrids. The core of this function is stats::prcomp, with several specific options to deal with climate data.

Usage

prinComp(
  grid,
  n.eofs = NULL,
  v.exp = NULL,
  which.combine = NULL,
  scaling = "gridbox",
  keep.orig = FALSE,
  rot = FALSE,
  quiet = FALSE,
  imputation = "mean"
)

Arguments

grid

A grid (gridded or station dataset), multigrid, multimember grid or multimember multigrid object

n.eofs

Integer vector. Number of EOFs to be retained. Default to NULL, indicating that either all EOFs are kept, or that the next argument will be used as criterion for its determination. See next argument and details.

v.exp

Maximum fraction of explained variance, in the range (0,1]. Used to determine the number of EOFs to be retained, as an alternative to n.eofs. Default to NULL. See details.

which.combine

Optional. A character vector with the short names of the variables of the multigrid used to construct 'combined' PCs (use the getVarNames helper if not sure about variable names).

scaling

Method for performing the scaling (and centering) of the input raw data matrix. Currently only the "gridbox" option is available. Currently accepted choices are "field" (the default) and "gridbox". See details.

keep.orig

Logical flag indicating wheter to return the input data -the standardized input data matrices- used to perform the PCA (keep.orig = TRUE) or not (FALSE). Default to FALSE.

rot

logical value indicating whether VARIMAX-Rotation should be performed. Default: FALSE.

quiet

True to silence all the messages (but not the warnings)

imputation

A string value: c("mean","median"). Replaces missing data with the mean or the median when calculating the PCs. This approach is based on the literature.

Details

Number of EOFs

n.eofs and v.exp are alternative choices for the determination of the number of EOFs (hence also the corresponding PCs) to be retained. If both are NULL (the default) , all EOFs will be retained. If both are given a value different from NULL, the n.eofs argument will prevail, and v.exp will be ignored, with a warning. When dealing with multigrids, the n.eofs argument can be either a single value or a vector of the same length as the number of variables contained in the multigrid plus (possibly) the COMBINED field if any. The same behaviour holds for v.exp.

Scaling and centering

In order to eliminate the effect of the varying scales of the different climate variables, the input data matrix is always scaled and centered, and there is no choice to avoid this step. However, the mean and standard deviation can be either computed for each grid box individually ("gridbox") or for all grid-boxes globally (i.e., at the field level, "field"). The last case is preferred in order to preserve the spatial structure of the original field, and has been set as the default, returning one single mean and sigma parameter for each variable. If the "gridbox" approach is selected, a vector of length n, where n is the number of grid-cells composing the grid, is returned for both the mean and sigma parameters (this is equivalent to using the scale function with the input data matrix).

The method used is returned as a global attribute of the returned object ("scaled:method"), and the mu and sigma parameters are returned as attributes of the corresponding variables ("scaled:scale" and "scaled:center" respectively).

As in the case of n.eofs and v.exp arguments, it is possible to indicate one single approach for all variables within multigrids (using one single value, as by default), or indicate a specific approach for each variable sepparately (using a vector of the same length as the number of variables contained in the multigrid). However, the latter approach is rarely used and it is just implemented for maximum flexibility in the downscaling experimental setup.

Combined EOF analysis

When dealing with multigrid data, apart from the PCA analysis performed on each variable individually, a combined analysis considering some or all variables together can be done. This is always returned in the last element of the output list under the name "COMBINED". The variables used for combination (if any) are controlled by the argument which.combine.

Value

A list of N elements for multigrids, where N is the number of input variables used, and N+1 if combined PCs are calculated, placed in the last place under the "COMBINED" name. In case of single grids (1 variable only), a list of length 1 (without the combined element). For each element of the list, the following objects are returned, either in the form of another list (1 element for each member) for multimembers, or not in the case of non multimember inputs:

  • PCs: A matrix of principal components, arranged in columns by decreasing importance order

  • EOFs: A matrix of EOFs, arranged in columns by decreasing importance order

  • orig: Either the original variable in the form of a 2D-matrix (when keep.orig = TRUE), or NA when keep.origin = FALSE (the default). In both cases, the parameters used for input data standardization (mean and standard deviation) are returned as attributes of this component (see the examples).

The “order of importance” is given by the explained variance of each PC, as indicated in the attribute "explained_variance" as a cumulative vector. Additional information is returned via the remaining attributes (see details), including geo-referencing and time.

Note

Performing PCA analysis on multimember multigrids may become time-consuming and computationally expensive. It is therefore advisable to avoid the use of this option for large datasets, and iterate over single multimember grids instead.

Author(s)

J. Bedia, M. de Felice

References

Gutierrez, J.M., R. Ancell, A. S. Cofiño and C. Sordo (2004). Redes Probabilisticas y Neuronales en las Ciencias Atmosfericas. MIMAM, Spain. 279 pp. http://www.meteo.unican.es/en/books/dataMiningMeteo

See Also

Other pca: PC2grid(), grid2PCs(), gridFromPCA()

Examples


require(climate4R.datasets)
data("NCEP_Iberia_hus850", "NCEP_Iberia_psl", "NCEP_Iberia_ta850")
multigrid <- makeMultiGrid(NCEP_Iberia_hus850, NCEP_Iberia_psl, NCEP_Iberia_ta850)
# In this example, we retain the PCs explaining the 99\% of the variance
pca <- prinComp(multigrid, v.exp = c(.95,0.90,.90), keep.orig = FALSE)
# The output is a named list with the PC's and EOFs (plus additional atttributes) for each variable
# within the input grid:
str(pca)
names(pca)
# Note that, apart from computing the principal components and EOFs for each grid, 
# it also returns, in the last element of the output list,
# the results of a PC analysis of the combined variables when 'which.combine' is activated:
pca <- prinComp(multigrid, v.exp = c(.99,.95,.90,.95),
                which.combine = c("ta@850", "psl"), keep.orig = FALSE)
str(pca)
# A special attribute indicates the variables used for combination
attributes(pca$COMBINED)
# The different attributes of the pca object provide information regarding the variables involved
# and the geo-referencing information
str(attributes(pca))
# In addition, for each variable (and their combination), the scaling and centering parameters 
# are also returned. There is one value of each parameter per grid point. For instance, 
# the parameters for the specific humidity field are:
attributes(pca[["hus@850"]][[1]]$orig)$`scaled:center`
attributes(pca[["hus@850"]][[1]]$orig)$`scaled:scale`
# In addition, the (cumulative) explained variance of each PC is also returned:
vexp <- attributes(pca$"hus@850"[[1]])$explained_variance
# The classical "scree plot":
barplot(1-vexp, names.arg = paste("PC",1:length(vexp)), las = 2, 
        ylab = "Fraction of unexplained variance")
# This is an example using a multimember object:
data("CFS_Iberia_hus850")
# In this case we retain the first 5 EOFs:
pca.mm <- prinComp(CFS_Iberia_hus850, n.eofs = 5) 
# Note that now the results of the PCA for the variable are a named list, with the results 
# for each member sepparately considered
str(pca.mm)
# The most complex situation comes from multimember multigrids:
data("CFS_Iberia_pr", "CFS_Iberia_tas")
# Now the multimember multigrid is constructed
mm.multigrid <- makeMultiGrid(CFS_Iberia_tas, CFS_Iberia_pr)
# Use different n.eofs for each variable:
pca.mm.mf <- prinComp(mm.multigrid, n.eofs = c(3,5))


SantanderMetGroup/transformeR documentation built on Oct. 28, 2023, 5:26 a.m.