corpus-class | R Documentation |
Extensions of base R functions for corpus objects.
## S3 method for class 'corpus'
c1 + c2
## S3 method for class 'corpus'
c(..., recursive = FALSE)
## S3 method for class 'corpus'
x[i, drop_docid = TRUE]
## S3 method for class 'summary.corpus'
print(x, ...)
c1 |
corpus one to be added |
c2 |
corpus two to be added |
recursive |
logical used by |
x |
a corpus object |
i |
document names or indices for documents to extract. |
drop_docid |
if |
The +
operator for a corpus object will combine two corpus
objects, resolving any non-matching docvars()
by making them
into NA
values for the corpus lacking that field. Corpus-level meta
data is concatenated, except for source
and notes
, which are
stamped with information pertaining to the creation of the new joined
corpus.
The c()
operator is also defined for corpus class objects, and provides
an easy way to combine multiple corpus objects.
There are some issues that need to be addressed in future revisions of
quanteda concerning the use of factors to store document variables and
meta-data. Currently most or all of these are not recorded as factors,
because we use stringsAsFactors=FALSE
in the
data.frame()
calls that are used to create and store the
document-level information, because the texts should always be stored as
character vectors and never as factors.
The +
and c()
operators return a corpus()
object.
Indexing a corpus works in three ways, as of v2.x.x:
[
returns a subsetted corpus
[[
returns the textual contents of a subsetted corpus (similar to as.character()
)
$
returns a vector containing the single named docvars
summary.corpus()
# concatenate corpus objects
corp1 <- corpus(data_char_ukimmig2010[1:2])
corp2 <- corpus(data_char_ukimmig2010[3:4])
corp3 <- corpus(data_char_ukimmig2010[5:6])
summary(c(corp1, corp2, corp3))
# two ways to index corpus elements
data_corpus_inaugural["1793-Washington"]
data_corpus_inaugural[2]
# return the text itself
data_corpus_inaugural[["1793-Washington"]]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.