lma_dict | R Documentation |
Returns a list of function words based on the Linguistic Inquiry and Word Count 2015 dictionary (in terms of category names – words were selected independently), or a list of special characters and patterns.
lma_dict(..., as.regex = TRUE, as.function = FALSE)
... |
Numbers or letters corresponding to category names: ppron, ipron, article, adverb, conj, prep, auxverb, negate, quant, interrog, number, interjection, or special. |
as.regex |
Logical: if |
as.function |
Logical or a function: if specified and |
A list with a vector of terms for each category, or (when as.function = TRUE
) a function which
accepts an initial "terms" argument (a character vector), and any additional arguments determined by function
entered as as.function
(grepl
by default).
The special
category is not returned unless specifically requested. It is a list of regular expression
strings attempting to capture special things like ellipses and emojis, or sets of special characters (those outside
of the Basic Latin range; [^\u0020-\u007F]
), which can be used for character conversions.
If special
is part of the returned list, as.regex
is set to TRUE
.
The special
list is always used by both lma_dtm
and lma_termcat
. When creating a
dtm, special
is used to clean the original input (so that, by default, the punctuation involved in ellipses
and emojis are treated as different – as ellipses and emojis rather than as periods and parens and colons and such).
When categorizing a dtm, the input dictionary is passed by the special lists to be sure the terms in the dtm match up
with the dictionary (so, for example, ": (" would be replaced with "repfrown" in both the text and dictionary).
To score texts with these categories, use lma_termcat()
.
# return the full dictionary (excluding special)
lma_dict()
# return the standard 7 category lsm categories
lma_dict(1:7)
# return just a few categories without regular expression
lma_dict(neg, ppron, aux, as.regex = FALSE)
# return special specifically
lma_dict(special)
# returning a function
is.ppron <- lma_dict(ppron, as.function = TRUE)
is.ppron(c("i", "am", "you", "were"))
in.lsmcat <- lma_dict(1:7, as.function = TRUE)
in.lsmcat(c("a", "frog", "for", "me"))
## use as a stopword filter
is.stopword <- lma_dict(as.function = TRUE)
dtm <- lma_dtm("Most of these words might not be all that relevant.")
dtm[, !is.stopword(colnames(dtm))]
## use to replace special characters
clean <- lma_dict(special, as.function = gsub)
clean(c(
"\u201Ccurly quotes\u201D", "na\u00EFve", "typographer\u2019s apostrophe",
"en\u2013dash", "em\u2014dash"
))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.