Description Usage Arguments Details Value References
Distances for situation when every entry on secondary dimension is characterized by a numeric vector (embedding). In the example of term-document matrix where document is a primary dimension, each term has a numeric representation in a N-dimensional space. For user-movie rating, vectors for movies can represent various movie characteristics. The aggregation distances (adist for short) perform various aggregation steps of these vectors
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | adist_centroid(x, y = NULL, vecs, ptrans = NULL, strans = NULL,
by = c("primary", "secondary", "row", "column"), pairwise = FALSE,
precompute = !pairwise, dist_type = "cosine")
adist_semantic_min_sum(x, y = NULL, vecs, ptrans = NULL, strans = NULL,
by = c("primary", "secondary", "row", "column"), pairwise = FALSE,
precompute = !pairwise, dist_type = "cosine")
adist_semantic_min_max(x, y = NULL, vecs, ptrans = NULL, strans = NULL,
by = c("primary", "secondary", "row", "column"), pairwise = FALSE,
precompute = !pairwise, dist_type = "cosine")
adist_rwmd(x, y = NULL, vecs, ptrans = NULL, strans = NULL,
by = c("primary", "secondary", "row", "column"), pairwise = FALSE,
precompute = !pairwise, dist_type = "cosine")
|
x |
sparse or dense objects supported by |
y |
sparse or dense objects supported by |
vecs |
Dense matrix with columns |
ptrans |
Primary and secondary transformations. Can be either a
function, string or a numeric vector. When a function, it must take 3
arguments - an object supported by |
strans |
Primary and secondary transformations. Can be either a
function, string or a numeric vector. When a function, it must take 3
arguments - an object supported by |
by |
Dimension along which to perform distance computation. For all supported data structures computation along primary dimension is more or as efficient than along the secondary dimension. |
precompute |
logical Weather to optimize the computation for speed and
precompute individual distances. The computation is method specific bug
generally should be |
dist_type |
distance to use across individual vectors in |
centroid
Within each primary entry (document, user etc.) the
vectors of secondary entries (terms, movies etc) are averaged element-wise
and dist_type
is applied on the resulting vectors.
semantic_min_sum
Measure of semantic distance proposed in
[1]. In a nutshell, For computing semantic distance between documents A
(column in x) and B (column in y), first for each term a in A the minimal
distance to terms in B is computed with dist_type
distance. Then,
this values are summed with weights co-responding weights (x
matrix). Same procedure applies to terms from B, the resulting two values
are summed:
DIST(A, B)=∑_a x_{A,a}\min_b D(a,b) + ∑_b x_{B,b}\min_a D(b,a)
Note that in [1] the authors weight each term by normalized IDF
weight. The formulation in this package is more general. You can achieve
their formula by applying "idf" strans
and "l1" ptrans
transformations. See examples.
semantic_min_max
Measure of semantic similarity proposed in
[2]. The authors used the name "Relaxed Word Mover Distance" to emphasize
that the measure is a lower bound of the well known "Earth Mover Distance"
transportation problem. The metric is a variation of
semantic_min_sum
where the max
is used in the last step
instead of sum
adist_rwmd
Relaxed Word Mover Distance - same as
adist_semantic_min_max
.
A matrix of the distances. If y=NULL
, the value is a cross
distance of x
.
[1] Mihalcea, Rada, Courtney Corley, and Carlo Strapparava. <e2><80><98>Corpus-Based and Knowledge-Based Measures of Text Semantic Similarity<e2><80><99>. In AAAI, 6:775<e2><80><93>80, 2006.
[2] Ye, Xin, Hui Shen, Xiao Ma, Razvan Bunescu, and Chang Liu. <e2><80><98>From Word Embeddings to Document Similarities for Improved Information Retrieval in Software Engineering<e2><80><99>. In Proceedings of the 38th International Conference on Software Engineering, 404<e2><80><93>415. ICSE <e2><80><99>16. New York, NY, USA: ACM, 2016. doi:10.1145/2884781.2884862.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.