prinComp | R Documentation |
Performs a Principal Component Analysis of grids, multigrids or multimember multigrids. The core of this
function is stats::prcomp
, with several specific options to deal with climate data.
prinComp(
grid,
n.eofs = NULL,
v.exp = NULL,
which.combine = NULL,
scaling = "gridbox",
keep.orig = FALSE,
rot = FALSE,
quiet = FALSE,
imputation = "mean"
)
grid |
A grid (gridded or station dataset), multigrid, multimember grid or multimember multigrid object |
n.eofs |
Integer vector. Number of EOFs to be retained. Default to |
v.exp |
Maximum fraction of explained variance, in the range (0,1]. Used to determine the number of EOFs
to be retained, as an alternative to |
which.combine |
Optional. A character vector with the short names of the variables of the multigrid used
to construct 'combined' PCs (use the |
scaling |
Method for performing the scaling (and centering) of the input raw data matrix. Currently only the |
keep.orig |
Logical flag indicating wheter to return the input data -the standardized input data matrices-
used to perform the PCA ( |
rot |
logical value indicating whether VARIMAX-Rotation should be performed. Default: FALSE. |
quiet |
True to silence all the messages (but not the warnings) |
imputation |
A string value: c("mean","median"). Replaces missing data with the mean or the median when calculating the PCs. This approach is based on the literature. |
Number of EOFs
n.eofs
and v.exp
are alternative choices for the determination
of the number of EOFs (hence also the corresponding PCs) to be retained. If both are NULL
(the default)
, all EOFs will be retained. If both are given a value different from NULL
,
the n.eofs
argument will prevail, and v.exp
will be ignored, with a warning.
When dealing with multigrids, the n.eofs
argument can be either a single value or a vector
of the same length as the number of variables contained in the multigrid plus (possibly) the COMBINED field if any.
The same behaviour holds for v.exp
.
Scaling and centering
In order to eliminate the effect of the varying scales of the different climate variables, the input
data matrix is always scaled and centered, and there is no choice to avoid this step. However, the mean
and standard deviation can be either computed for each grid box individually ("gridbox"
) or for all
grid-boxes globally (i.e., at the field level, "field"
). The last case is preferred in order to preserve
the spatial structure of the original field, and has been set as the default,
returning one single mean and sigma parameter for each variable. If the "gridbox"
approach is selected, a vector of length n, where n is the number of grid-cells composing the
grid, is returned for both the mean and sigma parameters (this is equivalent to using the scale
function with the input data matrix).
The method used is returned as a global attribute of the returned object ("scaled:method"
), and the
mu and sigma parameters are returned as attributes of the corresponding variables
("scaled:scale"
and "scaled:center"
respectively).
As in the case of n.eofs
and v.exp
arguments, it is possible to indicate one single approach
for all variables within multigrids (using one single value, as by default), or indicate a specific approach for
each variable sepparately (using a vector of the same length as the number of variables contained in the multigrid). However,
the latter approach is rarely used and it is just implemented for maximum flexibility in the downscaling experimental setup.
Combined EOF analysis
When dealing with multigrid data, apart from the PCA analysis performed on each variable individually,
a combined analysis considering some or all variables together can be done. This is always returned in the last element
of the output list under the name "COMBINED"
. The variables used for combination (if any) are controlled by the
argument which.combine
.
A list of N elements for multigrids, where N is the number of input variables used, and
N+1 if combined PCs are calculated, placed in the last place under the "COMBINED"
name.
In case of single grids (1 variable only), a list of length 1 (without the combined element). For each element of the list, the following objects are returned, either in the form of
another list (1 element for each member) for multimembers, or not in the case of non multimember inputs:
PCs
: A matrix of principal components, arranged in columns by decreasing importance order
EOFs
: A matrix of EOFs, arranged in columns by decreasing importance order
orig
: Either the original variable in the form of a 2D-matrix (when keep.orig = TRUE
),
or NA
when keep.origin = FALSE
(the default). In both cases, the parameters used for input data standardization
(mean and standard deviation) are returned as attributes of this component (see the examples).
The “order of importance” is given by the explained variance of each PC, as indicated
in the attribute "explained_variance"
as a cumulative vector.
Additional information is returned via the remaining attributes (see details), including geo-referencing and time.
Performing PCA analysis on multimember multigrids may become time-consuming and computationally expensive. It is therefore advisable to avoid the use of this option for large datasets, and iterate over single multimember grids instead.
J. Bedia, M. de Felice
Gutierrez, J.M., R. Ancell, A. S. Cofiño and C. Sordo (2004). Redes Probabilisticas y Neuronales en las Ciencias Atmosfericas. MIMAM, Spain. 279 pp. http://www.meteo.unican.es/en/books/dataMiningMeteo
Other pca:
PC2grid()
,
grid2PCs()
,
gridFromPCA()
require(climate4R.datasets)
data("NCEP_Iberia_hus850", "NCEP_Iberia_psl", "NCEP_Iberia_ta850")
multigrid <- makeMultiGrid(NCEP_Iberia_hus850, NCEP_Iberia_psl, NCEP_Iberia_ta850)
# In this example, we retain the PCs explaining the 99\% of the variance
pca <- prinComp(multigrid, v.exp = c(.95,0.90,.90), keep.orig = FALSE)
# The output is a named list with the PC's and EOFs (plus additional atttributes) for each variable
# within the input grid:
str(pca)
names(pca)
# Note that, apart from computing the principal components and EOFs for each grid,
# it also returns, in the last element of the output list,
# the results of a PC analysis of the combined variables when 'which.combine' is activated:
pca <- prinComp(multigrid, v.exp = c(.99,.95,.90,.95),
which.combine = c("ta@850", "psl"), keep.orig = FALSE)
str(pca)
# A special attribute indicates the variables used for combination
attributes(pca$COMBINED)
# The different attributes of the pca object provide information regarding the variables involved
# and the geo-referencing information
str(attributes(pca))
# In addition, for each variable (and their combination), the scaling and centering parameters
# are also returned. There is one value of each parameter per grid point. For instance,
# the parameters for the specific humidity field are:
attributes(pca[["hus@850"]][[1]]$orig)$`scaled:center`
attributes(pca[["hus@850"]][[1]]$orig)$`scaled:scale`
# In addition, the (cumulative) explained variance of each PC is also returned:
vexp <- attributes(pca$"hus@850"[[1]])$explained_variance
# The classical "scree plot":
barplot(1-vexp, names.arg = paste("PC",1:length(vexp)), las = 2,
ylab = "Fraction of unexplained variance")
# This is an example using a multimember object:
data("CFS_Iberia_hus850")
# In this case we retain the first 5 EOFs:
pca.mm <- prinComp(CFS_Iberia_hus850, n.eofs = 5)
# Note that now the results of the PCA for the variable are a named list, with the results
# for each member sepparately considered
str(pca.mm)
# The most complex situation comes from multimember multigrids:
data("CFS_Iberia_pr", "CFS_Iberia_tas")
# Now the multimember multigrid is constructed
mm.multigrid <- makeMultiGrid(CFS_Iberia_tas, CFS_Iberia_pr)
# Use different n.eofs for each variable:
pca.mm.mf <- prinComp(mm.multigrid, n.eofs = c(3,5))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.