eval_similarity_correlation: Evaluate DSM on Correlation with Similarity Ratings...
In wordspace: Distributional Semantic Models in R

eval.similarity.correlation

R Documentation

Evaluate DSM on Correlation with Similarity Ratings (wordspace)

Description

Performs evaluation by comparing the distances (or similarities) computed by a DSM with (typically human) word similarity ratings. Well-know examples are the noun pair ratings collected by Rubenstein & Goodenough (1965; RG65) and Finkelstein et al. (2002; WordSim353).

The quality of the DSM predictions is measured by Spearman rank correlation rho.

Usage

eval.similarity.correlation(task, M, dist.fnc=pair.distances,
                            details=FALSE, format=NA, taskname=NA,
                            word1.name="word1", word2.name="word2", score.name="score",
                            ...)

Arguments

`task`	a data frame containing word pairs (usually in columns `word1` and `word2`) with similarity ratings (usually in column `score`); any other columns will be ignored
`M`	a scored DSM matrix, passed to `dist.fnc`
`dist.fnc`	a callback function used to compute distances or similarities between word pairs. It will be invoked with character vectors containing the components of the word pairs as first and second argument, the DSM matrix `M` as third argument, plus any additional arguments (`...`) passed to `eval.similarity.correlation`. The return value must be a numeric vector of appropriate length. If one of the words in a pair is not represented in the DSM, the corresponding distance value should be set to `Inf` (or `-Inf` in the case of similarities).
`details`	if `TRUE`, a detailed report with information on each task item is returned (see Value below for details)
`format`	if the task definition specifies POS-disambiguated lemmas in CWB/Penn format, they can automatically be transformed into some other notation conventions; see `convert.lemma` for details
`taskname`	optional row label for the short report (`details=FALSE`)
`...`	any further arguments are passed to `dist.fnc` and can be used e.g. to select a distance measure
`word1.name`	the name of the column of `task` containing the first word of each pair
`word2.name`	the name of the column of `task` containing the second word of each pair
`score.name`	the name of the column of `task` containing the corresponding similarity ratings

Details

DSM distances are computed for all word pairs and compared with similarity ratings from the gold standard. As an evaluation criterion, Spearman rank correlation between the DSM and gold standard scores is computed. The function also reports a confidence interval for Pearson correlation, which might require suitable transformation to ensure a near-linear relationship in order to be meaningful.

NB: Since the correlation between similarity ratings and DSM distances will usually be negative, the evaluation report omits minus signs on the correlation coefficients.

With the default dist.fnc, the distance values can optionally be transformed through an arbitrary function specified in the transform argument (see pair.distances for details). Examples include transform=log (esp. for neighbour rank as a distance measure) and transform=function (x) 1/(1+x) (in order to transform distances into similarities). Note that Spearman rank correlation is not affected by any monotonic transformation, so the main evaluation results will remain unchanged.

If one or both words of a pair are not found in the DSM, the distance is set to a fixed value 10% above the maximum of all other DSM distances, or 10% below the minimum in the case of similarity values. This is done in order to avoid numerical and visualization problems with Inf values; the particular value used does not affect the rank correlation coefficient.

With the default dist.fnc callback, additional arguments method and p can be used to select a distance measure (see dist.matrix for details); rank=TRUE can be specified in order to use neighbour rank as a measure of semantic distance.

Value

The default short report (details=FALSE) is a data frame with a single row and the following columns:

`rho`	(absolute value of) Spearman rank correlation coefficient rho
`p.value`	p-value indicating evidence for a significant correlation
`missing`	number of pairs not included in the DSM
`r`	(absolute value of) Pearson correlation coefficient r
`r.lower`	lower bound of confidence interval for Pearson correlation
`r.upper`	upper bound of confidence interval for Pearson correlation

The detailed report (details=TRUE) is a copy of the original task data with two additional columns:

`distance`	distance calculated by the DSM for each word pair, possibly transformed (numeric)
`missing`	whether word pair is missing from the DSM (logical)

In addition, the short report is appended to the data frame as an attribute "eval.result", and the optional taskname value as attribute "taskname". The data frame is marked as an object of class eval.similarity.correlation, for which suitable print and plot methods are defined.

Author(s)

Stephanie Evert (https://purl.org/stephanie.evert)

References

Finkelstein, Lev, Gabrilovich, Evgeniy, Matias, Yossi, Rivlin, Ehud, Solan, Zach, Wolfman, Gadi, and Ruppin, Eytan (2002). Placing search in context: The concept revisited. ACM Transactions on Information Systems, 20(1), 116–131.

Rubenstein, Herbert and Goodenough, John B. (1965). Contextual correlates of synonymy. Communications of the ACM, 8(10), 627–633.

Examples


eval.similarity.correlation(RG65, DSM_Vectors)

## Not run: 
plot(eval.similarity.correlation(RG65, DSM_Vectors, details=TRUE))

## End(Not run)

wordspace documentation built on Aug. 23, 2022, 1:06 a.m.

wordspace index

Package overview Distributional Semantics in R with the 'wordspace' Package

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

wordspace
Distributional Semantic Models in R

eval_similarity_correlation: Evaluate DSM on Correlation with Similarity Ratings...
In wordspace: Distributional Semantic Models in R

Evaluate DSM on Correlation with Similarity Ratings (wordspace)

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to eval_similarity_correlation in wordspace...

R Package Documentation

Browse R Packages

We want your feedback!

wordspace Distributional Semantic Models in R

eval_similarity_correlation: Evaluate DSM on Correlation with Similarity Ratings... In wordspace: Distributional Semantic Models in R

Evaluate DSM on Correlation with Similarity Ratings (wordspace)

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to eval_similarity_correlation in wordspace...

R Package Documentation

Browse R Packages

We want your feedback!

wordspace
Distributional Semantic Models in R

eval_similarity_correlation: Evaluate DSM on Correlation with Similarity Ratings...
In wordspace: Distributional Semantic Models in R