frequency_table_creator: Create the word frequency table of corpora
In amacanovic/KeynessMeasures: Calculating keyness measures for corpus analysis

View source: R/frequency_table_creator.R

frequency_table_creator

R Documentation

Create the word frequency table of corpora

Description

This function takes a dataframe with needed documents as an input and outputs a table with frequencies of each word in each of the two corpora. The output contains the list of all words in two corpora and the frequencies in the target and the reference corpus. The target corpus is defined by specifying the grouping variable (denoting belonging of documents to corpora) and the target value of the grouping variable (where the documents with the matching value of the grouping variable are sorted into the target corpus, while all the remaining documents are sorted into the refeence corpus).

Usage

frequency_table_creator(
  df,
  text_field = NULL,
  grouping_variable = NULL,
  grouping_variable_target = NULL,
  lemmatize = FALSE,
  remove_punct = FALSE,
  remove_symbols = FALSE,
  remove_numbers = FALSE,
  remove_url = FALSE
)

Arguments

`df`	a `data.frame`
`text_field`	a string; the name of the variable storing text
`grouping_variable`	a string; the name of the variable to be be used in the creation of the target and reference corpora. It's values are used to group the documents into corpora and calculate appropriate frequencies.
`grouping_variable_target`	a string; the value of the variable to use to create the target corpus. All the other values of this variable will be grouped into a reference corpus.
`lemmatize`	logical; if `TRUE`, the text will be lemmatized before frequency calculation.
`remove_punct`	logical; if `TRUE`, punctuation will be removed when calculating the word frequency table.
`remove_symbols`	logical; if `TRUE`, symbols will be removed when calculating the word frequency table.
`remove_numbers`	logical; if `TRUE`, numbers will be removed when calculating the word frequency table.
`remove_url`	logical; if `TRUE`, urls will be removed when calculating the word frequency table.