Description Usage Arguments Details Value Author(s) See Also Examples
Formalizes a collection of texts into a sento_corpus
object derived from the quanteda
corpus
object. The quanteda package provides a robust text mining infrastructure
(see their website), including a handy corpus manipulation toolset. This function
performs a set of checks on the input data and prepares the corpus for further analysis by structurally
integrating a date dimension and numeric metadata features.
1 | sento_corpus(corpusdf, do.clean = FALSE)
|
corpusdf |
a |
do.clean |
a |
A sento_corpus
object is a specialized instance of a quanteda corpus
. Any
quanteda function applicable to its corpus
object can also be applied to a sento_corpus
object. However, changing a given sento_corpus
object too drastically using some of quanteda's functions might
alter the very structure the corpus is meant to have (as defined in the corpusdf
argument) to be able to be used as
an input in other functions of the sentometrics package. There are functions, including
corpus_sample
or corpus_subset
, that do not change the actual corpus
structure and may come in handy.
To add additional features, use add_features
. Binary features are useful as
a mechanism to select the texts which have to be integrated in the respective feature-based sentiment measure(s), but
applies only when do.ignoreZeros = TRUE
. Because of this (implicit) selection that can be performed, having
complementary features (e.g., "economy"
and "noneconomy"
) makes sense.
It is also possible to add one non-numerical feature, that is, "language"
, to designate the language
of the corpus texts. When this feature is provided, a list
of lexicons for different
languages is expected in the compute_sentiment
function.
A sento_corpus
object, derived from a quanteda corpus
object. The corpus is ordered by date.
Samuel Borms
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | data("usnews", package = "sentometrics")
# corpus construction
corp <- sento_corpus(corpusdf = usnews)
# take a random subset making use of quanteda
corpusSmall <- quanteda::corpus_sample(corp, size = 500)
# deleting a feature
quanteda::docvars(corp, field = "wapo") <- NULL
# deleting all features results in the addition of a dummy feature
quanteda::docvars(corp, field = c("economy", "noneconomy", "wsj")) <- NULL
## Not run:
# to add or replace features, use the add_features() function...
quanteda::docvars(corp, field = c("wsj", "new")) <- 1
## End(Not run)
# corpus creation when no features are present
corpusDummy <- sento_corpus(corpusdf = usnews[, 1:3])
# corpus creation with a qualitative language feature
usnews[["language"]] <- "en"
usnews[["language"]][c(200:400)] <- "nl"
corpusLang <- sento_corpus(corpusdf = usnews)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.