dfm | R Documentation |
Construct a sparse document-feature matrix, from a character, corpus, tokens, or even other dfm object.
dfm(
x,
tolower = TRUE,
remove_padding = FALSE,
verbose = quanteda_options("verbose"),
...
)
x |
a tokens or dfm object |
tolower |
convert all features to lowercase |
remove_padding |
logical; if |
verbose |
display messages if |
... |
not used directly |
a dfm object
In quanteda v3, many convenience functions formerly available in
dfm()
were deprecated. Formerly, dfm()
could be called directly on a
character
or corpus
object, but we now steer users to tokenise their
inputs first using tokens()
. Other convenience arguments to dfm()
were
also removed, such as select
, dictionary
, thesaurus
, and groups
. All
of these functions are available elsewhere, e.g. through dfm_group()
.
See news(Version >= "2.9", package = "quanteda")
for details.
dfm_select()
, dfm
## for a corpus
toks <- data_corpus_inaugural %>%
corpus_subset(Year > 1980) %>%
tokens()
dfm(toks)
# removal options
toks <- tokens(c("a b c", "A B C D")) %>%
tokens_remove("b", padding = TRUE)
toks
dfm(toks)
dfm(toks) %>%
dfm_remove(pattern = "") # remove "pads"
# preserving case
dfm(toks, tolower = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.