knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
devtools::load_all()

sehrnett

R-CMD-check Codecov test coverage CRAN status R-CMD-check

The goal of sehrnett is to provide a nice (and fast) interface to Princeton's WordNet. Unlike the original wordnet package (Feinerer et al., 2020), you don't need to install WordNet and / or setup rJava.

The data is not included in the package. Please run download_wordnet() to download the data (~100M Zipped, ~400M Unzipped) from the Internet, if such data is not available. Please make sure you agree with the WordNet License.

Installation

devtools::install_github("chainsawriot/sehrnett")

get_lemmas

The most basic function is get_lemmas. It generates basic information about the lemmas [^1] you provided.

library(sehrnett)
get_lemmas(c("very", "nice"))
get_lemmas("nice")
get_lemmas("nice", pos = "n")

Please note that some definitions in WordNet are considered pejorative or offensive, e.g.

get_lemmas("dog")

Dot notation

The dot notation ("lemma.pos.sensenum") can be used to quick search for a particular word sense. For example, one can search for "king.n.10" to quickly pin down the word sense of "king" as a chess piece.

get_lemmas("king.n.10")

Lemmatization

The morphological processing of the original Wordnet is partially implemented in sehrnett [^2]. As the Wordnet's database contains only information about lemmas (e.g. eat), you need to convert inflected variants (e.g. ate, eaten, eating) back to their lemmas to query them. The process is otherwise known as lemmatization.

sehrnett provides such lemmatization. But you need to provide exactly one pos and set lemmatize to TRUE (default).

get_lemmas(c("ate", "ducking"), pos = "v")
get_lemmas(c("loci", "lemmata", "boxesful"), pos = "n")
get_lemmas(c("nicest", "stronger"), pos = "a")

A practical example

For example, you want to know the synonyms of the word "nuance" (very important for academic writing). You can first search using the lemma "nuance" with get_lemmas.

res <- get_lemmas("nuance")
res

There could be multiple word senses and you need to choose which word sense you want to convey. But in this case, there is only one. You can then search for the synsetid (cognitive synonym identifier) of that word sense.

# get_synonyms() is a wrapper to get_synsetids
get_synsetids(res$synsetid[1])

Chainablilty

All get_ functions are chainable by using the magrittr pipe operator.

c("switch off") %>% get_lemmas(pos = "v") %>% get_synonyms

get_outdegrees

WordNet is indeed a network. synsetids are connected to each other in a directed graph. An node (a synsetid) is linked to another with different link (edge) types labelling with different linkids. You can list out all available linkids with the function list_linktypes.

list_linktypes()
## all hypernyms
get_lemmas("dog", pos = "n", sensenum = 1) %>% get_outdegrees(linkid = 1)
## all hyponymes
get_lemmas("dog", pos = "n", sensenum = 1) %>% get_outdegrees(linkid = 2)
## all antonyms
get_lemmas("nice", pos = "a", sensenum = 1) %>% get_outdegrees(linkid = 30)

Sugars

sehrnett provides several syntactic sugars as get_ functions. For example:

## all hyponymes
get_lemmas("dog", pos = "n", sensenum = 1) %>% get_hyponyms()
get_lemmas("nice", pos = "a", sensenum = 1) %>% get_antonyms()
get_lemmas("nice", pos = "a", sensenum = 1) %>% get_derivatives()

[^1]: Yes, the plural of lemma can also be lemmata, you Latin-speaking people.

[^2]: Like many implementations (e.g. NLTK, Ruby's rwordnet and node-wordnet-magic), the morpological processing is only partial. Collocations and hyphenation are not supported. Therefore, please don't expect that lemmatizing asking for it would obtain ask for it (as documented in Wordnet's website).



chainsawriot/sehrnett documentation built on March 11, 2023, 1:13 a.m.