SemCorWSD: SemCor Word Sense Disambiguation Task (wordspace)

SemCorWSDR Documentation

SemCor Word Sense Disambiguation Task (wordspace)

Description

A collection of sentences containing ambiguous words manually labelled with WordNet senses. The data were obtained from the SemCor corpus version 3.0.

Usage


SemCorWSD

Format

A data frame with 647 rows and the following 8 columns (all of type character):

id

Unique item ID

target

The target word (lemmatized)

pos

Word class of the target word (n, v or a)

sense

Sense of the target word in this sentence (given as a WordNet lemma)

gloss

WordNet definition of this sense

sentence

The sentence containing the ambiguous word

hw

Lemmatized form of the sentence (“headwords”); punctuation marks are excluded and all remaining words are case-folded

lemma

Lemmatized and POS-disambiguated form in CWB/Penn format, e.g. move_N for the headword move used as a noun

Details

Target words and senses had to meet the following criteria in order to be included in the data set:

  • sense occurs f ≥ 5 times in SemCor 3.0

  • sense accounts for at least 10% of all occurrences of the target

  • at least two senses of target remain after previous two filters

SemCorWSD contains sentence contexts for the following target words:

  • ambiguous nouns from Schütze (1998): interest, plant, space, vessel

  • misc. ambiguous nouns: bank

  • misc. ambiguous verbs: find, grasp, open, run

Source

TODO (SemCor reference, NLTK extraction)

References

Schütze, Hinrich (1998). Automatic word sense discrimination. Computational Linguistics, 24(1), 97–123.

See Also

context.vectors

Examples


with(SemCorWSD, table(sense, target))

# all word senses with brief definitions ("glosses")
with(SemCorWSD, sort(unique(paste0(target, " ", sense, ": ", gloss))))


wordspace documentation built on Aug. 23, 2022, 1:06 a.m.