select.dict | R Documentation |
Retrieve information and links to dictionaries (lexicons/word lists) available at osf.io/y6g5b.
select.dict(query = NULL, dir = getOption("lingmatch.dict.dir"),
check.md5 = TRUE, mode = "wb")
query |
A character matching a dictionary name, or a set of keywords to search for in dictionary information. |
dir |
Path to a folder containing dictionaries, or where you want them to be saved. Will look in getOption('lingmatch.dict.dir') and '~/Dictionaries' by default. |
check.md5 |
Logical; if |
mode |
Passed to |
A list with varying entries:
info
: The version of osf.io/kjqb8 stored internally; a
data.frame
with dictionary names as row names, and information about each dictionary in columns.
Also described at
osf.io/y6g5b/wiki/dict_variables,
here short
(corresponding to the file name [{short}.(csv|dic)
] and
wiki urls [https://osf.io/y6g5b/wiki/{short}
]) is set as row names and removed:
name
: Full name of the dictionary.
description
: Description of the dictionary, relating to its purpose and
development.
note
: Notes about processing decisions that additionally alter the original.
constructor
: How the dictionary was constructed:
algorithm
: Terms were selected by some automated process, potentially
learned from data or other resources.
crowd
: Several individuals rated the terms, and in aggregate those ratings
translate to categories and weights.
mixed
: Some combination of the other methods, usually in some iterative
process.
team
: One of more individuals make decisions about term inclusions,
categories, and weights.
subject
: Broad, rough subject or purpose of the dictionary:
emotion
: Terms relate to emotions, potentially exemplifying or expressing
them.
general
: A large range of categories, aiming to capture the content of the
text.
impression
: Terms are categorized and weighted based on the impression they
might give.
language
: Terms are categorized or weighted based on their linguistic
features, such as part of speech, specificity, or area of use.
social
: Terms relate to social phenomena, such as characteristics or concerns
of social entities.
terms
: Number of unique terms across categories.
term_type
: Format of the terms:
glob
: Include asterisks which denote inclusion of any characters until a
word boundary.
glob+
: Glob-style asterisks with regular expressions within terms.
ngram
: Includes any number of words as a term, separated by spaces.
pattern
: A string of characters, potentially within or between words, or
spanning words.
regex
: Regular expressions.
stem
: Unigrams with common endings removed.
unigram
: Complete single words.
weighted
: Indicates whether weights are associated with terms. This
determines the file type of the dictionary: dictionaries with weights are stored
as .csv, and those without are stored as .dic files.
regex_characters
: Logical indicating whether special regular expression
characters are present in any term, which might need to be escaped if the terms are used
in regular expressions. Glob-type terms allow complete parens (at least one open and one
closed, indicating preceding or following words), and initial and terminal asterisks. For
all other terms, [](){}*.^$+?\|
are counted as regex characters. These could be
escaped in R with gsub('([][)(}{*.^$+?\\|])', '\\\1', terms)
if terms
is a character vector, and in Python with (importing re)
[re.sub(r'([][(){}*.^$+?\|])', r'\\1', term)
for term in terms]
if terms
is a list.
categories
: Category names in the order in which they appear in the dictionary
file, separated by commas.
ncategories
: Number of categories.
original_max
: Maximum value of the original dictionary before standardization:
original values / max(original values) * 100
. Dictionaries with no weights are
considered to have a max of 1
.
osf
: ID of the file on OSF, translating to the file's URL:
https://osf.io/osf
.
wiki
: URL of the dictionary's wiki.
downloaded
: Path to the file if downloaded, and ''
otherwise.
selected
: A subset of info
selected by query
.
Other Dictionary functions:
dictionary_meta()
,
download.dict()
,
lma_patcat()
,
lma_termcat()
,
read.dic()
,
report_term_matches()
# just retrieve information about available dictionaries
dicts <- select.dict()$info
dicts[1:10, 4:9]
# select all dictionaries mentioning sentiment or emotion
sentiment_dicts <- select.dict("sentiment emotion")$selected
sentiment_dicts[1:10, 4:9]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.