| select.dict | R Documentation |
Retrieve information and links to dictionaries (lexicons/word lists) available at osf.io/y6g5b.
select.dict(query = NULL, dir = getOption("lingmatch.dict.dir"),
check.md5 = TRUE, mode = "wb")
query |
A character matching a dictionary name, or a set of keywords to search for in dictionary information. |
dir |
Path to a folder containing dictionaries, or where you want them to be saved. Will look in getOption('lingmatch.dict.dir') and '~/Dictionaries' by default. |
check.md5 |
Logical; if |
mode |
Passed to |
A list with varying entries:
info: The version of osf.io/kjqb8 stored internally; a
data.frame with dictionary names as row names, and information about each dictionary in columns.
Also described at
osf.io/y6g5b/wiki/dict_variables,
here short (corresponding to the file name [{short}.(csv|dic)] and
wiki urls [https://osf.io/y6g5b/wiki/{short}]) is set as row names and removed:
name: Full name of the dictionary.
description: Description of the dictionary, relating to its purpose and
development.
note: Notes about processing decisions that additionally alter the original.
constructor: How the dictionary was constructed:
algorithm: Terms were selected by some automated process, potentially
learned from data or other resources.
crowd: Several individuals rated the terms, and in aggregate those ratings
translate to categories and weights.
mixed: Some combination of the other methods, usually in some iterative
process.
team: One of more individuals make decisions about term inclusions,
categories, and weights.
subject: Broad, rough subject or purpose of the dictionary:
emotion: Terms relate to emotions, potentially exemplifying or expressing
them.
general: A large range of categories, aiming to capture the content of the
text.
impression: Terms are categorized and weighted based on the impression they
might give.
language: Terms are categorized or weighted based on their linguistic
features, such as part of speech, specificity, or area of use.
social: Terms relate to social phenomena, such as characteristics or concerns
of social entities.
terms: Number of unique terms across categories.
term_type: Format of the terms:
glob: Include asterisks which denote inclusion of any characters until a
word boundary.
glob+: Glob-style asterisks with regular expressions within terms.
ngram: Includes any number of words as a term, separated by spaces.
pattern: A string of characters, potentially within or between words, or
spanning words.
regex: Regular expressions.
stem: Unigrams with common endings removed.
unigram: Complete single words.
weighted: Indicates whether weights are associated with terms. This
determines the file type of the dictionary: dictionaries with weights are stored
as .csv, and those without are stored as .dic files.
regex_characters: Logical indicating whether special regular expression
characters are present in any term, which might need to be escaped if the terms are used
in regular expressions. Glob-type terms allow complete parens (at least one open and one
closed, indicating preceding or following words), and initial and terminal asterisks. For
all other terms, [](){}*.^$+?\| are counted as regex characters. These could be
escaped in R with gsub('([][)(}{*.^$+?\\|])', '\\\1', terms) if terms
is a character vector, and in Python with (importing re)
[re.sub(r'([][(){}*.^$+?\|])', r'\\1', term) for term in terms] if terms
is a list.
categories: Category names in the order in which they appear in the dictionary
file, separated by commas.
ncategories: Number of categories.
original_max: Maximum value of the original dictionary before standardization:
original values / max(original values) * 100. Dictionaries with no weights are
considered to have a max of 1.
osf: ID of the file on OSF, translating to the file's URL:
https://osf.io/osf.
wiki: URL of the dictionary's wiki.
downloaded: Path to the file if downloaded, and '' otherwise.
selected: A subset of info selected by query.
Other Dictionary functions:
dictionary_meta(),
download.dict(),
lma_patcat(),
lma_termcat(),
read.dic(),
report_term_matches()
# just retrieve information about available dictionaries
dicts <- select.dict()$info
dicts[1:10, 4:9]
# select all dictionaries mentioning sentiment or emotion
sentiment_dicts <- select.dict("sentiment emotion")$selected
sentiment_dicts[1:10, 4:9]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.