frequency_table_creator_old: Create the word frequency table of corpora

View source: R/frequency_table_creator_old.R

frequency_table_creator_oldR Documentation

Create the word frequency table of corpora

Description

This function takes a dataframe with needed documents as an input and outputs a table with frequencies of each word in each of the two corpora. The output contains the list of all words in two corpora and the frequencies in the target and the reference corpus. The target corpus is defined by specifying the grouping variable (denoting belonging of documents to corpora) and the target value of the grouping variable (where the documents with the matching value of the grouping variable are sorted into the target corpus, while all the remaining documents are sorted into the refeence corpus).

Usage

frequency_table_creator_old(
  df,
  text_field = NULL,
  grouping_variable = NULL,
  grouping_variable_target = NULL,
  lemmatize = FALSE,
  remove_punct = FALSE,
  remove_symbols = FALSE,
  remove_numbers = FALSE,
  remove_url = FALSE
)

Arguments

df

a data.frame

text_field

a string; the name of the variable storing text

grouping_variable

a string; the name of the variable to be be used in the creation of the target and reference corpora. It's values are used to group the documents into corpora and calculate appropriate frequencies.

grouping_variable_target

a string; the value of the variable to use to create the target corpus. All the other values of this variable will be grouped into a reference corpus.

lemmatize

logical; if TRUE, the text will be lemmatized before frequency calculation.

remove_punct

logical; if TRUE, punctuation will be removed when calculating the word frequency table.

remove_symbols

logical; if TRUE, symbols will be removed when calculating the word frequency table.

remove_numbers

logical; if TRUE, numbers will be removed when calculating the word frequency table.

remove_url

logical; if TRUE, urls will be removed when calculating the word frequency table.

Details

This code is compatible with quanteda package version below 3.

Relies on textstem package for lemmatization and quanteda package for frequency calculation

Value

A dataframe with word frequencies in the target and reference corpora.


amacanovic/KeynessMeasures documentation built on July 6, 2022, 1:38 a.m.