ortho_diss | R Documentation |
This function computes dissimilarities (in an orthogonal space) between either observations in a given set or between observations in two different sets.The dissimilarities are computed based on either principal component projection or partial least squares projection of the data. After projecting the data, the Mahalanobis distance is applied.
ortho_diss(Xr, Xu = NULL,
Yr = NULL,
pc_selection = list(method = "var", value = 0.01),
diss_method = "pca",
.local = FALSE,
pre_k,
center = TRUE,
scale = FALSE,
compute_all = FALSE,
return_projection = FALSE,
allow_parallel = TRUE, ...)
Xr |
a matrix containing |
Xu |
an optional matrix containing data of a second set of observations
with |
Yr |
a matrix of
|
pc_selection |
a list of length 2 which specifies the method to be used
for optimizing the number of components (principal components or pls factors)
to be retained. This list must contain two elements (in the following order):
Default is Optionally, the |
diss_method |
a character value indicating the type of projection on which
the dissimilarities must be computed. This argument is equivalent to
See the |
.local |
a logical indicating whether or not to compute the dissimilarities
locally (i.e. projecting locally the data) by using the |
pre_k |
if |
center |
a logical indicating if the |
scale |
a logical indicating if the |
compute_all |
a logical. In case |
return_projection |
a logical. If |
allow_parallel |
a logical (default TRUE). It allows parallel computing
of the local distance matrices (i.e. when |
... |
additional arguments to be passed to the
|
When .local = TRUE
, first a global dissimilarity matrix is computed based on
the parameters specified. Then, by using this matrix for each target
observation, a given set of nearest neighbors (pre_k
) are identified.
These neighbors (together with the target observation) are projected
(from the original data space) onto a (local) orthogonal space (using the
same parameters specified in the function). In this projected space the
Mahalanobis distance between the target observation and its neighbors is
recomputed. A missing value is assigned to the observations that do not belong to
this set of neighbors (non-neighbor observations).
In this case the dissimilarity matrix cannot be considered as a distance
metric since it does not necessarily satisfies the symmetry condition for
distance matrices (i.e. given two observations \mjeqnx_ix_i and \mjeqnx_jx_j, the local
dissimilarity (\mjeqndd) between them is relative since generally
\mjeqnd(x_i, x_j) \neq d(x_j, x_i)d(x_i, x_j) ne d(x_j, x_i)). On the other hand, when
.local = FALSE
, the dissimilarity matrix obtained can be considered as
a distance matrix.
In the cases where "Yr"
is required to compute the dissimilarities and
if .local = TRUE
, care must be taken as some neighborhoods might
not have enough observations with non-missing "Yr"
values, which might retrieve
unreliable dissimilarity computations.
If "opc"
or "manual"
are used in pc_selection$method
and .local = TRUE
, the minimum number of observations with non-missing
"Yr"
values at each neighborhood is determined by
pc_selection$value
(i.e. the maximum number of components to compute).
a list
of class ortho_diss
with the following elements:
n_components
: the number of components (either principal
components or partial least squares components) used for computing the
global dissimilarities.
global_variance_info
: the information about the expalined
variance(s) of the projection. When .local = TRUE
, the information
corresponds to the global projection done prior computing the local
projections.
local_n_components
: if .local = TRUE
, a data.table
which specifies the number of local components (either principal components
or partial least squares components) used for computing the dissimilarity
between each target observation and its neighbor observations.
dissimilarity
: the computed dissimilarity matrix. If
.local = FALSE
a distance matrix. If .local = TRUE
a matrix of
class local_ortho_diss
. In this case, each column represent the dissimilarity
between a target observation and its neighbor observations.
projection
: if return_projection = TRUE
,
an ortho_projection
object.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013a. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex data sets. Geoderma 195-196, 268-279.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte, J. A. M., Scholten, T. 2013b. Distance and similarity-search metrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.
ortho_projection
, sim_eval
library(prospectr)
data(NIRsoil)
Xu <- NIRsoil$spc[!as.logical(NIRsoil$train), ]
Yu <- NIRsoil[!as.logical(NIRsoil$train), "CEC", drop = FALSE]
Yr <- NIRsoil[as.logical(NIRsoil$train), "CEC", drop = FALSE]
Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]
Xu <- Xu[!is.na(Yu), ]
Yu <- Yu[!is.na(Yu), , drop = FALSE]
Xr <- Xr[!is.na(Yr), ]
Yr <- Yr[!is.na(Yr), , drop = FALSE]
# Computation of the orthogonal dissimilarity matrix using the
# default parameters
pca_diss <- ortho_diss(Xr, Xu)
# Computation of a principal component dissimilarity matrix using
# the "opc" method for the selection of the principal components
pca_diss_optim <- ortho_diss(
Xr, Xu, Yr,
pc_selection = list("opc", 40),
compute_all = TRUE
)
# Computation of a partial least squares (PLS) dissimilarity
# matrix using the "opc" method for the selection of the PLS
# components
pls_diss_optim <- ortho_diss(
Xr = Xr, Xu = Xu,
Yr = Yr,
pc_selection = list("opc", 40),
diss_method = "pls"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.