Description Usage Arguments Details Value Note Examples
View source: R/corpus-methods-quanteda.R
Get or replace the texts in a corpus, with grouping options.
Works for plain character vectors too, if groups is a factor.
1 2 3 4 5 6 |
x |
a corpus or character object |
groups |
either: a character vector containing the names of document
variables to be used for grouping; or a factor or object that can be
coerced into a factor equal in length or rows to the number of documents.
|
spacer |
when concatenating texts by using |
value |
character vector of the new texts |
... |
unused |
as.character(x) where x is a corpus is equivalent to
calling texts(x)
For texts, a character vector of the texts in the corpus.
For texts <-, the corpus with the updated texts.
for texts <-, a corpus with the texts replaced by value
as.character(x) is equivalent to texts(x)
The groups will be used for concatenating the texts based on shared
values of groups, without any specified order of aggregation.
You are strongly encouraged as a good practice of text analysis
workflow not to modify the substance of the texts in a corpus.
Rather, this sort of processing is better performed through downstream
operations. For instance, do not lowercase the texts in a corpus, or you
will never be able to recover the original case. Rather, apply
tokens_tolower() after applying tokens() to a
corpus, or use the option tolower = TRUE in dfm().
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | nchar(texts(corpus_subset(data_corpus_inaugural, Year < 1806)))
# grouping on a document variable
nchar(texts(corpus_subset(data_corpus_inaugural, Year < 1806), groups = "President"))
# grouping a character vector using a factor
nchar(texts(data_corpus_inaugural[1:5],
groups = "President"))
nchar(texts(data_corpus_inaugural[1:5],
groups = factor(c("W", "W", "A", "J", "J"))))
corp <- corpus(c("We must prioritise honour in our neighbourhood.",
"Aluminium is a valourous metal."))
texts(corp) <-
stringi::stri_replace_all_regex(texts(corp),
c("ise", "([nlb])our", "nium"),
c("ize", "$1or", "num"),
vectorize_all = FALSE)
texts(corp)
texts(corp)[2] <- "New text number 2."
texts(corp)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.