computeFrequencies: Normalized Frequencies.
In keblu/GWP: Generalized Word Power Approach

Description Usage Arguments Value Examples

View source: R/computeFrequencies.R

Generate the Normalized Frequency table from textual data input.

1	computeFrequencies(corpus, sentimentWord, shifterWord, clusterSize = 1)

`corpus`	corpus with column `docID`, `regID`, and `texts`. docID indicate a unique ID for the document. regID is used for aggregation of document for the regression. It must match with the specific regID of the response variable in `fitGWP`. texts are the textual data.
`sentimentWord`	Vector of words used for computing the sentiment.
`shifterWord`	Matrix with element `x` and `y`. x is a vector of valence shifting words while y is the modifier values.
`clusterSize`	Scalar indicating the window in which valance shifting words have an influence.

A list with the following elements:

docID: unique document ID.
regID: regression ID.
loc: location of the sentiment word within the texts.
word: sentiment word.
shift: shigt coefficient modifying the sentiment word.
NormalizedFrequency: frequency of word normalized by number of token in each text
NormalizedFrequencyPerRegID: Normalzied Frequency normalized by the number of documents per each regression ID.

# Load example data
data("corpus",  package = "GWP")

# Setup the lexicons
sentimentWord <- sentometrics::list_lexicons$LM_en$x
shifterWord <- sentometrics::list_valence_shifters$en[, c("x", "y")]

# Generate the frequency data
frequencies <- computeFrequencies(corpus, sentimentWord, shifterWord, clusterSize = 5)