Description Usage Arguments Details Value Note Examples
View source: R/corpus-methods-quanteda.R
Get or replace the texts in a corpus, with grouping options.
Works for plain character vectors too, if groups
is a factor.
1 2 3 4 5 6 |
x |
a corpus or character object |
groups |
either: a character vector containing the names of document
variables to be used for grouping; or a factor or object that can be
coerced into a factor equal in length or rows to the number of documents.
|
spacer |
when concatenating texts by using |
value |
character vector of the new texts |
... |
unused |
as.character(x)
where x
is a corpus is equivalent to
calling texts(x)
For texts
, a character vector of the texts in the corpus.
For texts <-
, the corpus with the updated texts.
for texts <-
, a corpus with the texts replaced by value
as.character(x)
is equivalent to texts(x)
The groups
will be used for concatenating the texts based on shared
values of groups
, without any specified order of aggregation.
You are strongly encouraged as a good practice of text analysis
workflow not to modify the substance of the texts in a corpus.
Rather, this sort of processing is better performed through downstream
operations. For instance, do not lowercase the texts in a corpus, or you
will never be able to recover the original case. Rather, apply
tokens_tolower()
after applying tokens()
to a
corpus, or use the option tolower = TRUE
in dfm()
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | nchar(texts(corpus_subset(data_corpus_inaugural, Year < 1806)))
# grouping on a document variable
nchar(texts(corpus_subset(data_corpus_inaugural, Year < 1806), groups = "President"))
# grouping a character vector using a factor
nchar(texts(data_corpus_inaugural[1:5],
groups = "President"))
nchar(texts(data_corpus_inaugural[1:5],
groups = factor(c("W", "W", "A", "J", "J"))))
corp <- corpus(c("We must prioritise honour in our neighbourhood.",
"Aluminium is a valourous metal."))
texts(corp) <-
stringi::stri_replace_all_regex(texts(corp),
c("ise", "([nlb])our", "nium"),
c("ize", "$1or", "num"),
vectorize_all = FALSE)
texts(corp)
texts(corp)[2] <- "New text number 2."
texts(corp)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.