knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" ) devtools::load_all()
The goal of sehrnett is to provide a nice (and fast) interface to Princeton's WordNet. Unlike the original wordnet package (Feinerer et al., 2020), you don't need to install WordNet and / or setup rJava.
The data is not included in the package. Please run download_wordnet()
to download the data (~100M Zipped, ~400M Unzipped) from the Internet, if such data is not available. Please make sure you agree with the WordNet License.
devtools::install_github("chainsawriot/sehrnett")
get_lemmas
The most basic function is get_lemmas
. It generates basic information about the lemmas [^1] you provided.
library(sehrnett)
get_lemmas(c("very", "nice"))
get_lemmas("nice")
get_lemmas("nice", pos = "n")
Please note that some definitions in WordNet are considered pejorative or offensive, e.g.
get_lemmas("dog")
The dot notation ("lemma.pos.sensenum") can be used to quick search for a particular word sense. For example, one can search for "king.n.10" to quickly pin down the word sense of "king" as a chess piece.
get_lemmas("king.n.10")
The morphological processing of the original Wordnet is partially implemented in sehrnett
[^2]. As the Wordnet's database contains only information about lemmas (e.g. eat), you need to convert inflected variants (e.g. ate, eaten, eating) back to their lemmas to query them. The process is otherwise known as lemmatization.
sehrnett
provides such lemmatization. But you need to provide exactly one pos
and set lemmatize
to TRUE
(default).
get_lemmas(c("ate", "ducking"), pos = "v")
get_lemmas(c("loci", "lemmata", "boxesful"), pos = "n")
get_lemmas(c("nicest", "stronger"), pos = "a")
For example, you want to know the synonyms of the word "nuance" (very important for academic writing). You can first search using the lemma "nuance" with get_lemmas
.
res <- get_lemmas("nuance") res
There could be multiple word senses and you need to choose which word sense you want to convey. But in this case, there is only one. You can then search for the synsetid
(cognitive synonym identifier) of that word sense.
# get_synonyms() is a wrapper to get_synsetids get_synsetids(res$synsetid[1])
All get_
functions are chainable by using the magrittr pipe operator.
c("switch off") %>% get_lemmas(pos = "v") %>% get_synonyms
get_outdegrees
WordNet is indeed a network. synsetids are connected to each other in a directed graph. An node (a synsetid) is linked to another with different link (edge) types labelling with different linkid
s. You can list out all available linkid
s with the function list_linktypes
.
list_linktypes()
## all hypernyms get_lemmas("dog", pos = "n", sensenum = 1) %>% get_outdegrees(linkid = 1)
## all hyponymes get_lemmas("dog", pos = "n", sensenum = 1) %>% get_outdegrees(linkid = 2)
## all antonyms get_lemmas("nice", pos = "a", sensenum = 1) %>% get_outdegrees(linkid = 30)
sehrnett
provides several syntactic sugars as get_
functions. For example:
## all hyponymes get_lemmas("dog", pos = "n", sensenum = 1) %>% get_hyponyms()
get_lemmas("nice", pos = "a", sensenum = 1) %>% get_antonyms()
get_lemmas("nice", pos = "a", sensenum = 1) %>% get_derivatives()
[^1]: Yes, the plural of lemma can also be lemmata, you Latin-speaking people.
[^2]: Like many implementations (e.g. NLTK, Ruby's rwordnet and node-wordnet-magic), the morpological processing is only partial. Collocations and hyphenation are not supported. Therefore, please don't expect that lemmatizing asking for it would obtain ask for it (as documented in Wordnet's website).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.