View source: R/search_neighbors.R
search_neighbors | R Documentation |
This function searches in a reference set the neighbors of the observations provided in another set.
search_neighbors(Xr, Xu, diss_method = c("pca", "pca.nipals", "pls", "mpls",
"cor", "euclid", "cosine", "sid"),
Yr = NULL, k, k_diss, k_range, spike = NULL,
pc_selection = list("var", 0.01),
return_projection = FALSE, return_dissimilarity = FALSE,
ws = NULL,
center = TRUE, scale = FALSE,
documentation = character(), ...)
Xr |
a matrix of reference (spectral) observations where the neighbor search is to be conducted. See details. |
Xu |
an optional matrix of (spectral) observations for which its
neighbors are to be searched in |
diss_method |
a character string indicating the spectral dissimilarity metric to be used in the selection of the nearest neighbors of each observation.
|
Yr |
a numeric matrix of
|
k |
an integer value indicating the k-nearest neighbors of each
observation in |
k_diss |
an integer value indicating a dissimilarity treshold.
For each observation in |
k_range |
an integer vector of length 2 which specifies the minimum
(first value) and the maximum (second value) number of neighbors to be
retained when the |
spike |
a vector of integers (with positive and/or negative values)
indicating what observations in |
pc_selection |
a list of length 2 to be passed onto the
The default is Optionally, the |
return_projection |
a logical indicating if the projection(s) must be
returned. Projections are used if the |
return_dissimilarity |
a logical indicating if the dissimilarity matrix used for neighbor search must be returned. |
ws |
an odd integer value which specifies the window size, when
|
center |
a logical indicating if the |
scale |
a logical indicating if the |
documentation |
an optional character string that can be used to
describe anything related to the |
... |
further arguments to be passed to the |
This function may be specially useful when the reference set (Xr
) is
very large. In some cases the number of observations in the reference set
can be reduced by removing irrelevant observations (i.e. observations that are not
neighbors of a particular target set). For example, this fucntion can be
used to reduce the size of the reference set before before running the
mbl
function.
This function uses the dissimilarity
fucntion to compute the
dissimilarities between Xr
and Xu
. Arguments to
dissimilarity
as well as further arguments to the functions
used inside dissimilarity
(i.e. ortho_diss
cor_diss
f_diss
sid
) can be passed to
those functions as additional arguments (i.e. ...
).
If no matrix is passed to Xu
, the neighbor search is conducted for the
observations in Xr
that are found whiting that matrix. If a matrix is
passed to Xu
, the neighbors of Xu
are searched in the Xr
matrix.
a list
containing the following elements:
neighbors_diss
: a matrix of the Xr
dissimilarity scores
corresponding to the neighbors of each Xr
observation (or Xu
observation, in case Xu
was supplied).
The neighbor dissimilarity scores are organized by columns and are sorted
in ascending order.
neighbors
: a matrix of the Xr
indices corresponding to
the neighbors of each observation in Xu
. The neighbor indices are
organized by columns and are sorted in ascending order by their
dissimilarity score.
unique_neighbors
: a vector of the indices in Xr
identified as neighbors of any observation in Xr
(or in Xu
,
in case it was supplied). This is obtained by
converting the neighbors
matrix into a vector and applying the
unique
function.
k_diss_info
: a data.table
that is returned only if the
k_diss
argument was used. It comprises three columns, the first one
(Xr_index
or Xu_index
) indicates the index of the observations
in Xr
(or in Xu
, in case it was suppplied),
the second column (n_k
) indicates the number of neighbors found in
Xr
and the third column (final_n_k
) indicates the final number
of neighbors selected bounded by k_range
.
argument.
dissimilarity
: If return_dissimilarity = TRUE
the
dissimilarity object used (as computed by the dissimilarity
function.
projection
: an ortho_projection
object. Only output if
return_projection = TRUE
and if diss_method = "pca"
,
diss_method = "pca.nipals"
or diss_method = "pls"
.
This object contains the projection used to compute
the dissimilarity matrix. In case of local dissimilarity matrices,
the projection corresponds to the global projection used to select the
neighborhoods. (see ortho_diss
function for further
details).
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Stevens, A., Dematte, J.A.M., Scholten, T. 2013a. The spectrum-based learner: A new local approach for modeling soil vis-NIR spectra of complex data sets. Geoderma 195-196, 268-279.
Ramirez-Lopez, L., Behrens, T., Schmidt, K., Viscarra Rossel, R., Dematte, J. A. M., Scholten, T. 2013b. Distance and similarity-search metrics for use with soil vis-NIR spectra. Geoderma 199, 43-53.
dissimilarity
ortho_diss
cor_diss
f_diss
sid
mbl
library(prospectr)
data(NIRsoil)
Xu <- NIRsoil$spc[!as.logical(NIRsoil$train), ]
Yu <- NIRsoil$CEC[!as.logical(NIRsoil$train)]
Yr <- NIRsoil$CEC[as.logical(NIRsoil$train)]
Xr <- NIRsoil$spc[as.logical(NIRsoil$train), ]
Xu <- Xu[!is.na(Yu), ]
Yu <- Yu[!is.na(Yu)]
Xr <- Xr[!is.na(Yr), ]
Yr <- Yr[!is.na(Yr)]
# Identify the neighbor observations using the correlation dissimilarity and
# default parameters
# (In this example all the observations in Xr belong at least to the
# first 100 neighbors of one observation in Xu)
ex1 <- search_neighbors(
Xr = Xr, Xu = Xu,
diss_method = "cor",
k = 40
)
# Identify the neighbor observations using principal component (PC)
# and partial least squares (PLS) dissimilarities, and using the "opc"
# approach for selecting the number of components
ex2 <- search_neighbors(
Xr = Xr, Xu = Xu,
diss_method = "pca",
Yr = Yr, k = 50,
pc_selection = list("opc", 40),
scale = TRUE
)
# Observations that do not belong to any neighborhood
seq(1, nrow(Xr))[!seq(1, nrow(Xr)) %in% ex2$unique_neighbors]
ex3 <- search_neighbors(
Xr = Xr, Xu = Xu,
diss_method = "pls",
Yr = Yr, k = 50,
pc_selection = list("opc", 40),
scale = TRUE
)
# Observations that do not belong to any neighborhood
seq(1, nrow(Xr))[!seq(1, nrow(Xr)) %in% ex3$unique_neighbors]
# Identify the neighbor observations using local PC dissimialrities
# Here, 150 neighbors are used to compute a local dissimilarity matrix
# and then this matrix is used to select 50 neighbors
ex4 <- search_neighbors(
Xr = Xr, Xu = Xu,
diss_method = "pls",
Yr = Yr, k = 50,
pc_selection = list("opc", 40),
scale = TRUE,
.local = TRUE,
pre_k = 150
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.