Description Usage Arguments Details Value See Also Examples
Extensions of base R functions for corpus objects.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
x |
a corpus object |
c1 |
corpus one to be added |
c2 |
corpus two to be added |
recursive |
logical used by |
i |
index for documents or rows of document variables |
j |
index for column of document variables |
drop |
if |
The +
operator for a corpus object will combine two corpus
objects, resolving any non-matching docvars()
by making them
into NA
values for the corpus lacking that field. Corpus-level meta
data is concatenated, except for source
and notes
, which are
stamped with information pertaining to the creation of the new joined
corpus.
The c()
operator is also defined for corpus class objects, and provides
an easy way to combine multiple corpus objects.
There are some issues that need to be addressed in future revisions of
quanteda concerning the use of factors to store document variables and
meta-data. Currently most or all of these are not recorded as factors,
because we use stringsAsFactors=FALSE
in the
data.frame()
calls that are used to create and store the
document-level information, because the texts should always be stored as
character vectors and never as factors.
is.corpus
returns TRUE
if the object is a corpus
The +
and c()
operators return a corpus()
object.
Indexing a corpus works in three ways, as of v2.x.x:
[
returns a subsetted corpus
[[
returns the textual contents of a subsetted corpus (similar to texts()
)
$
returns a vector containing the single named docvars
1 2 3 4 5 6 7 8 9 10 11 12 | # concatenate corpus objects
corp1 <- corpus(data_char_ukimmig2010[1:2])
corp2 <- corpus(data_char_ukimmig2010[3:4])
corp3 <- corpus(data_char_ukimmig2010[5:6])
summary(c(corp1, corp2, corp3))
# two ways to index corpus elements
data_corpus_inaugural["1793-Washington"]
data_corpus_inaugural[2]
# return the text itself
data_corpus_inaugural[["1793-Washington"]]
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.