similarity_graph  R Documentation 
Create a graph (as a sparse symmetric weighted adjacency matrix) representing the similarities between items in a data set. No dimensionality reduction is carried out. By default, the similarities are calculated using the merged fuzzy simplicial set approach in the Uniform Manifold Approximation and Projection (UMAP) method (McInnes et al., 2018), but the approach from LargeVis (Tang et al., 2016) can also be used.
similarity_graph( X = NULL, n_neighbors = NULL, metric = "euclidean", scale = NULL, set_op_mix_ratio = 1, local_connectivity = 1, nn_method = NULL, n_trees = 50, search_k = 2 * n_neighbors * n_trees, perplexity = 50, method = "umap", y = NULL, target_n_neighbors = n_neighbors, target_metric = "euclidean", target_weight = 0.5, pca = NULL, pca_center = TRUE, ret_extra = c(), n_threads = NULL, grain_size = 1, kernel = "gauss", tmpdir = tempdir(), verbose = getOption("verbose", TRUE), pca_method = NULL, binary_edge_weights = FALSE )
X 
Input data. Can be a 
n_neighbors 
The size of local neighborhood (in terms of number of
neighboring sample points) used for manifold approximation. Larger values
result in more global views of the manifold, while smaller values result in
more local data being preserved. In general values should be in the range

metric 
Type of distance metric to use to find nearest neighbors. One of:
Only applies if If Each metric calculation results in a separate fuzzy simplicial set, which are intersected together to produce the final set. Metric names can be repeated. Because nonnumeric columns are removed from the data frame, it is safer to use column names than integer ids. Factor columns can also be used by specifying the metric name
For a given data block, you may override the 
scale 
Scaling to apply to
For 
set_op_mix_ratio 
Interpolate between (fuzzy) union and intersection as
the set operation used to combine local fuzzy simplicial sets to obtain a
global fuzzy simplicial sets. Both fuzzy set operations use the product
tnorm. The value of this parameter should be between 
local_connectivity 
The local connectivity required – i.e. the number
of nearest neighbors that should be assumed to be connected at a local
level. The higher this value the more connected the manifold becomes
locally. In practice this should be not more than the local intrinsic
dimension of the manifold. Ignored if 
nn_method 
Method for finding nearest neighbors. Options are:
By default, if
or a sparse distance matrix of type 
n_trees 
Number of trees to build when constructing the nearest
neighbor index. The more trees specified, the larger the index, but the
better the results. With 
search_k 
Number of nodes to search during the neighbor retrieval. The
larger k, the more the accurate results, but the longer the search takes.
With 
perplexity 
Used only if 
method 
How to generate the similarities between items. One of:

y 
Optional target data to add supervised or semisupervised weighting
to the similarity graph . Can be a vector, matrix or data frame. Use the
Unlike 
target_n_neighbors 
Number of nearest neighbors to use to construct the
target simplicial set. Default value is 
target_metric 
The metric used to measure distance for 
target_weight 
Weighting factor between data topology and target
topology. A value of 0.0 weights entirely on data, a value of 1.0 weights
entirely on target. The default of 0.5 balances the weighting equally
between data and target. Only applies if 
pca 
If set to a positive integer value, reduce data to this number of
columns using PCA. Doesn't applied if the distance 
pca_center 
If 
ret_extra 
A vector indicating what extra data to return. May contain any combination of the following strings:

n_threads 
Number of threads to use. Default is half the number of
concurrent threads supported by the system. For nearest neighbor search,
only applies if 
grain_size 
The minimum amount of work to do on each thread. If this
value is set high enough, then less than 
kernel 
Used only if 
tmpdir 
Temporary directory to store nearest neighbor indexes during
nearest neighbor search. Default is 
verbose 
If 
pca_method 
Method to carry out any PCA dimensionality reduction when
the

binary_edge_weights 
If 
This is equivalent to running umap
with the
ret_extra = c("fgraph")
parameter, but without the overhead of
calculating (or returning) the optimized lowdimensional coordinates.
A sparse symmetrized matrix of the similarities between the items in
X
or if nn_method
contains precomputed nearest neighbor
data, the items in nn_method
. Because of the symmetrization, there
may be more nonzero items in each column than the specified value of
n_neighbors
(or precomputed neighbors in nn_method
).
If ret_extra
is specified then the return value will be a list
containing:
similarity_graph
the similarity graph as a sparse matrix
as described above.
nn
(if ret_extra
contained "nn"
) the nearest
neighbor data as a list called nn
. This contains one list for each
metric
calculated, itself containing a matrix idx
with the
integer ids of the neighbors; and a matrix dist
with the
distances. The nn
list (or a sublist) can be used as input to the
nn_method
parameter.
sigma
(if ret_extra
contains "sigma"
),
a vector of calibrated parameters, one for each item in the input data,
reflecting the local data density for that item. The exact definition of
the values depends on the choice of the method
parameter.
rho
(if ret_extra
contains "sigma"
), a
vector containing the largest distance to the locally connected neighbors
of each item in the input data. This will exist only if
method = "umap"
.
localr
(if ret_extra
contains "localr"
) a
vector of the estimated local radii, the sum of "sigma"
and
"rho"
. This will exist only if method = "umap"
.
McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction arXiv preprint arXiv:1802.03426. https://arxiv.org/abs/1802.03426
Tang, J., Liu, J., Zhang, M., & Mei, Q. (2016, April). Visualizing largescale and highdimensional data. In Proceedings of the 25th International Conference on World Wide Web (pp. 287297). International World Wide Web Conferences Steering Committee. https://arxiv.org/abs/1602.00370
iris30 < iris[c(1:10, 51:60, 101:110), ] # return a 30 x 30 sparse matrix with similarity data based on 10 nearest # neighbors per item iris30_sim_graph < similarity_graph(iris30, n_neighbors = 10) # Default is to use the UMAP method of calculating similarities, but LargeVis # is also available: for that method, use perplexity instead of n_neighbors # to control neighborhood size. Use ret_extra = "nn" to return nearest # neighbor data as well as the similarity graph. Return value is a list # containing similarity_graph' and 'nn' items. iris30_lv_graph < similarity_graph(iris30, perplexity = 10, method = "largevis", ret_extra = "nn") # If you have the neighbor information you don't need the original data iris30_lv_graph_nn < similarity_graph(nn_method = iris30_lv_graph$nn, perplexity = 10, method = "largevis") all(iris30_lv_graph_nn == iris30_lv_graph$similarity_graph)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.