dfm_analysis: Create a document frequency matrix (sorted in descending...

Description Usage Arguments Value

View source: R/dfm_analysis.R

Description

Create a document frequency matrix (sorted in descending order) of terms that occur within some window around a given set of words (phenomenon)

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
dfm_analysis(
  corpus,
  phenomenon,
  window = 10,
  n_terms = 10,
  filter_dictionary = NULL,
  tf_idf = FALSE,
  filter_ps = FALSE,
  ps = NULL,
  own_regex = FALSE
)

Arguments

corpus

the text or texts to be analyzed as a list of character vectors

phenomenon

a list of character vectors (or list of regular expressions if own_regex == TRUE) with terms around which words will be counted for the dfm

window

number of words left and right of a phenomenon term to be considered for the dfm

n_terms

number of terms displayed in dfm

filter_dictionary

a character vector (or regular expression if own_regex == TRUE) of words to select from the dfm

tf_idf

if TRUE function computes tf-idf metric instead of raw counts

filter_ps

if TRUE enables filtering of results by part of speech (i.e only adjectives and adverbs)

ps

character vector of parts of speech to filter. see selection with unique(tidytext::parts_of_speech[,"pos"])

own_regex

when TRUE allows you to add custom regular expressions for phenomenon and filter_dictionary. when FALSE rbow will construct regular expression from the character vectors you supplied. defaults to FALSE

Value

list of dfms (one dfm per text in corpus)


till-tietz/rbow documentation built on Oct. 21, 2021, 9:16 p.m.