corpus_summarize: Summarize the sento_corpus object

Description Usage Arguments Details Value Author(s) Examples

View source: R/sentocorpus.R

Description

Summarizes the sento_corpus object and returns insights about the evolution of documents, features and tokens over time.

Usage

1
corpus_summarize(x, by = "day", features = NULL)

Arguments

x

is a sento_corpus object created with sento_corpus

by

a single character vector to specify the frequency time interval over which the statistics need to be calculated.

features

a character vector that can be used to select a subset of the features to analyse.

Details

This function summarizes the sento_corpus object by generating statistics about documents, features and tokens over time. The insights can be narrowed down to a chosen set of metadata features. The same tokenization as in the sentiment calculation in compute_sentiment is used.

Value

returns a list containing:

stats

a data.table with statistics about the number of documents, total, average, minimum and maximum number of tokens and the number of texts per features for each date.

plots

a list with three plots representing the above statistics.

Author(s)

Jeroen Van Pelt, Samuel Borms, Andres Algaba

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
data("usnews", package = "sentometrics")

corpus <- sento_corpus(usnews)

# summary of corpus by day
summary1 <- corpus_summarize(corpus)

# summary of corpus by month for both journals
summary2 <- corpus_summarize(corpus, by = "month",
                             features = c("wsj", "wapo"))

sentometrics documentation built on Aug. 18, 2021, 9:06 a.m.