pair_distances: Semantic Distances Between Word Pairs (wordspace)
In wordspace: Distributional Semantic Models in R

pair.distances

R Documentation

Semantic Distances Between Word Pairs (wordspace)

Description

Compute semantic distances (or similarities) between pairs of target terms based on a scored DSM matrix M, according to any of the distance measures supported by dist.matrix. If one of the terms in a pair is not represented in the DSM, the distance is set to Inf (or to -Inf in the case of a similarity measure).

Usage


pair.distances(w1, w2, M, ..., transform = NULL, 
               rank = c("none", "fwd", "bwd", "avg"),
               avg.method = c("arithmetic", "geometric", "harmonic"),
               batchsize = 10e6, verbose = FALSE)

Arguments

`w1`	a character vector specifying the first term of each pair
`w2`	a character vector of the same length as `w1`, specifying the second term of each pair
`M`	a sparse or dense DSM matrix, suitable for passing to `dist.matrix`, or an object of class `dsm`. Alternatively, `M` can be a pre-computed distance or similarity matrix returned by `dist.matrix` or marked as such with `as.distmat`.
`...`	further arguments are passed to `dist.matrix` and determine the distance or similarity measure to be used (see `dist.matrix` for details)
`rank`	whether to return the distance between the two terms (`"none"`) or the neighbour rank (see “Details” below)
`transform`	an optional transformation function applied to the distance, similarity or rank values (e.g. `transform=log10` for logarithmic ranks). This option is provided as a convenience for evaluation code that calls `pair.distances` with user-specified arguments.
`avg.method`	with `rank="avg"`, whether to compute the arithmetic, geometric or harmonic mean of forward and backward rank
`batchsize`	maximum number of similarity values to compute per batch. This parameter has an essential influence on efficiency and memory use of the algorithm and has to be tuned carefully for optimal performance.
`verbose`	if `TRUE`, display some progress messages indicating how data are split into batches

Details

The rank argument controls whether semantic distance is measured directly by geometric distance (none), by forward neighbour rank (fwd), by backward neighbour rank (bwd), or by the average of forward and backward rank (avg). Forward neighbour rank is the rank of w2 among the nearest neighbours of w1. Backward neighbour rank is the rank of w1 among the nearest neighbours of w2. The average can be computed as an arithmetic, geometric or harmonic mean, depending on avg.method.

Note that a transformation function is applied after averaging. In order to compute the arithmetic mean of log ranks, set transform=log10, rank="avg" and avg.method="geometric".

Neighbour ranks assume that each target term is its own nearest neighbour and adjust ranks to account for this (i.e. w1 == w2 should return a rank of 0). If M is a pre-computed distance matrix, the adjustment is only applied if it is also marked as symmetric (because otherwise w1 might not appear in the list of neighbours at all). This might lead to unexpected results once asymmetric measures are implemented in dist.matrix.

For a sparse pre-computed similarity matrix M, only non-zero cells are considered as neighbours and all other ranks are set to Inf. This is consistent with the behaviour of nearest.neighbours.

pair.distances is used as a default callback in several evaluation functions, which rely on the attribute similarity to distinguish between distance measures and similarity scores. For this reason, transformation functions should always be isotonic (order-preserving) so as not to mislead the evaluation procedure.

Value

If rank="none" (the default), a numeric vector of the same length as w1 and w2 specifying the distances or similarities between the term pairs, according to the metric selected with the extra arguments (...).

Otherwise, an integer or numeric vector of the same length as w1 and w2 specifying forward, backward or average neighbour rank for the two terms.

In either case, a distance or rank of Inf (or a similarity of -Inf) is returned for any term pair not represented in the DSM. Attribute similarity is set to TRUE if the returned values are similarity scores rather than distances.

Author(s)

Stephanie Evert (https://purl.org/stephanie.evert)

Examples


transform(RG65, angle=pair.distances(word1, word2, DSM_Vectors))

wordspace documentation built on Aug. 23, 2022, 1:06 a.m.

wordspace index

Package overview Distributional Semantics in R with the 'wordspace' Package

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

wordspace
Distributional Semantic Models in R

pair_distances: Semantic Distances Between Word Pairs (wordspace)
In wordspace: Distributional Semantic Models in R

Semantic Distances Between Word Pairs (wordspace)

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to pair_distances in wordspace...

R Package Documentation

Browse R Packages

We want your feedback!

wordspace Distributional Semantic Models in R

pair_distances: Semantic Distances Between Word Pairs (wordspace) In wordspace: Distributional Semantic Models in R

Semantic Distances Between Word Pairs (wordspace)

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Related to pair_distances in wordspace...

R Package Documentation

Browse R Packages

We want your feedback!

wordspace
Distributional Semantic Models in R

pair_distances: Semantic Distances Between Word Pairs (wordspace)
In wordspace: Distributional Semantic Models in R