Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/ortho_projection.R
Functions to perform orthogonal projections of high dimensional data matrices using principal component analysis (pca) and partial least squares (pls).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19  ortho_projection(Xr, Xu = NULL,
Yr = NULL,
method = "pca",
pc_selection = list(method = "var", value = 0.01),
center = TRUE, scale = FALSE, ...)
pc_projection(Xr, Xu = NULL, Yr = NULL,
pc_selection = list(method = "var", value = 0.01),
center = TRUE, scale = FALSE,
method = "pca",
tol = 1e6, max_iter = 1000, ...)
pls_projection(Xr, Xu = NULL, Yr,
pc_selection = list(method = "opc", value = min(dim(Xr), 40)),
scale = FALSE,
tol = 1e6, max_iter = 1000, ...)
## S3 method for class 'ortho_projection'
predict(object, newdata, ...)

Xr 
a matrix of observations. 
Xu 
an optional matrix containing data of a second set of observations. 
Yr 
if the method used in the 
method 
the method for projecting the data. Options are:

pc_selection 
a list of length 2 which specifies the method to be used
for optimizing the number of components (principal components or pls factors)
to be retained. This list must contain two elements (in the following order):
The list 
center 
a logical indicating if the data 
scale 
a logical indicating if 
... 
additional arguments to be passed
to 
tol 
tolerance limit for convergence of the algorithm in the nipals algorithm (default is 1e06). In the case of PLS this applies only to Yr with more than one variable. 
max_iter 
maximum number of iterations (default is 1000). In the case of

object 
object of class 
newdata 
an optional data frame or matrix in which to look for variables with which to predict. If omitted, the scores are used. It must contain the same number of columns, to be used in the same order. 
In the case of method = "pca"
, the algrithm used is the singular value
decomposition in which a given data matrix (\mjeqnXX) is factorized as follows:
X = UDV^TX = UDV^\mathrmT
where \mjeqnUU and \mjeqnVV are orthogonal matrices, being the left and right singular vectors of \mjeqnXX respectively, \mjeqnDD is a diagonal matrix containing the singular values of \mjeqnXX and \mjeqnVV is the is a matrix of the right singular vectors of \mjeqnXX. The matrix of principal component scores is obtained by a matrix multiplication of \mjeqnUU and \mjeqnDD, and the matrix of principal component loadings is equivalent to the matrix \mjeqnVV.
When method = "pca.nipals"
, the algorithm used for principal component
analysis is the nonlinear iterative partial least squares (nipals).
In the case of the of the partial least squares projection (a.k.a projection to latent structures) the nipals regression algorithm is used. Details on the "nipals" algorithm are presented in Martens (1991).
When method = "opc"
, the selection of the components is carried out by
using an iterative method based on the side information concept
(RamirezLopez et al. 2013a, 2013b). First let be \mjeqnPP a sequence of
retained components (so that \mjeqnP = 1, 2, ...,k P = 1, 2, ...,k ).
At each iteration, the function computes a dissimilarity matrix retaining
\mjeqnp_ip_i components. The values in this side information variable are
compared against the side information values of their most spectrally similar
observations (closest Xr
observation).
The optimal number of components retrieved by the function is the one that
minimizes the root mean squared differences (RMSD) in the case of continuous
variables, or maximizes the kappa index in the case of categorical variables.
In this process, the sim_eval
function is used.
Note that for the "opc"
method Yr
is required (i.e. the
side information of the observations).
a list
of class ortho_projection
with the following
components:
scores
a matrix of scores corresponding to the observations in
Xr
(and Xu
if it was provided). The components retrieved
correspond to the ones optimized or specified.
X_loadings
a matrix of loadings corresponding to the
explanatory variables. The components retrieved correspond to the ones
optimized or specified.
Y_loadings
a matrix of partial least squares loadings
corresponding to Yr
. The components retrieved correspond to the
ones optimized or specified.
This object is only returned if the partial least squares algorithm was used.
weigths
a matrix of partial least squares ("pls") weights.
This object is only returned if the "pls" algorithm was used.
projection_mat
a matrix that can be used to project new data
onto a "pls" space. This object is only returned if the "pls" algorithm was
used.
variance
a matrix indicating the standard deviation of each
component (sd), the variance explained by each single component
(explained_var) and the cumulative explained variance
(cumulative_explained_var). These values are
computed based on the data used to create the projection matrices.
For example if the "pls" method was used, then these values are computed
based only on the data that contains information on Yr
(i.e. the
Xr
data). If the principal component method is used, the this data is
computed on the basis of Xr
and Xu
(if it applies) since both
matrices are employed in the computation of the projection matrix (loadings
in this case).
sdv
the standard deviation of the retrieved scores. This vector
can be different from the "sd" in variance
.
n_components
the number of components (either principal
components or partial least squares components) used for computing the
global dissimilarity scores.
opc_evaluation
a matrix containing the statistics computed
for optimizing the number of principal components based on the variable(s)
specified in the Yr
argument. If Yr
was a continuous was a
continuous vector or matrix then this object indicates the root mean square
of differences (rmse) for each number of components. If Yr
was a
categorical variable this object indicates the kappa values for each number
of components. This object is returned only if "opc"
was used within
the pc_selection
argument. See the sim_eval
function for
more details.
method
the ortho_projection
method used.
predict.ortho_projection
, returns a matrix of scores proprojected for
newdtata
.
Martens, H. (1991). Multivariate calibration. John Wiley & Sons.
RamirezLopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013a. The spectrumbased learner: A new local approach for modeling soil visNIR spectra of complex data sets. Geoderma 195196, 268279.
RamirezLopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte, J. A. M., Scholten, T. 2013b. Distance and similaritysearch metrics for use with soil visNIR spectra. Geoderma 199, 4353.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53  library(prospectr)
data(NIRsoil)
# Proprocess the data using detrend plus first derivative with Savitzky and
# Golay smoothing filter
sg_det < savitzkyGolay(
detrend(NIRsoil$spc,
wav = as.numeric(colnames(NIRsoil$spc))
),
m = 1,
p = 1,
w = 7
)
NIRsoil$spc_pr < sg_det
# split into training and testing sets
test_x < NIRsoil$spc_pr[NIRsoil$train == 0 & !is.na(NIRsoil$CEC), ]
test_y < NIRsoil$CEC[NIRsoil$train == 0 & !is.na(NIRsoil$CEC)]
train_y < NIRsoil$CEC[NIRsoil$train == 1 & !is.na(NIRsoil$CEC)]
train_x < NIRsoil$spc_pr[NIRsoil$train == 1 & !is.na(NIRsoil$CEC), ]
# A principal component analysis using 5 components
pca_projected < ortho_projection(train_x, pc_selection = list("manual", 5))
pca_projected
# A principal components projection using the "opc" method
# for the selection of the optimal number of components
pca_projected_2 < ortho_projection(
Xr = train_x, Xu = test_x, Yr = train_y,
method = "pca",
pc_selection = list("opc", 40)
)
pca_projected_2
plot(pca_projected_2)
# A partial least squares projection using the "opc" method
# for the selection of the optimal number of components
pls_projected < ortho_projection(
Xr = train_x, Xu = test_x, Yr = train_y,
method = "pls",
pc_selection = list("opc", 40)
)
pls_projected
plot(pls_projected)
# A partial least squares projection using the "cumvar" method
# for the selection of the optimal number of components
pls_projected_2 < ortho_projection(
Xr = train_x, Xu = test_x, Yr = train_y,
method = "pls",
pc_selection = list("cumvar", 0.99)
)

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.