get_featured_themes: Find the most frequently occurring themes in a collection

View source: R/get_featured_themes.R

get_featured_themesR Documentation

Find the most frequently occurring themes in a collection

Description

[Maturing]

get_featured_themes() calculates the top m most frequently occurring themes in a collection.

Usage

get_featured_themes(
  collection = NULL,
  top_m = 10,
  weights = list(choice = 3, major = 2, minor = 1),
  explicit = TRUE,
  min_freq = 1,
  blacklist = NULL
)

Arguments

collection

A Collection() class object.

If NULL, the collection of all stories in the actively loaded LTO version is used.

top_m

Maximum number of themes to report. The default is top_m=10.

If Inf, all themes occurring at least min_occurrence times in the collection are reported.

weights

A list assigning nonnegative weights to choice, major, and minor theme levels. The default weighting list(choice = 3, major = 2, minor = 1) counts each choice usage three times, each major theme usage twice, and each minor theme usage once. Use the uniform weighting list(choice = 1, major = 1, minor = 1) weights theme usages equally regardless of level. At least one weight must be positive.

explicit

Set to FALSE to include ancestor themes of the explicit thematic annotations.

min_freq

Drop themes occurring less than this number of times from the analysis. The default min_freq=1 results in no themes are discarded.

blacklist

A Themeset() class object. A themeset containing themes to be dropped from the analysis.

If NULL, no themes are filtered.

Details

The input collection of n stories, S[1], \ldots, S[n], is represented as a weighted bag-of-words, where each choice theme in story S[j] (j=1, \ldots, n) is counted weights$choice times, each major theme weights$major times, and each minor theme weights$choice times.

Value

Returns a tibble with top_m rows (themes) and 6 columns:

theme_name: m-th most frequently occurring theme in the collection
k: Number of collection stories featuring the theme
k_bar: Weighted counts of the theme summed over the collection stories
n: Number of stories in the collection
n_bar: Sum of all weighted counts of collection themes
tp: Theme weighted term proportion (i.e. k_bar/n_bar)

Examples

## Not run: 
# Retrieve the top 10 most featured themes in "The Twilight Zone" franchise
# stories:
set_lto("demo")
result_tbl <- get_featured_themes()
result_tbl

# Retrieve the top 10 most featured themes in "The Twilight Zone" franchise
# stories not including any minor level themes:
set_lto("demo")
result_tbl <- get_featured_themes(weights = list(choice = 1, major = 1, minor = 0))
result_tbl

# Retrieve the top 10 most featured themes in "The Twilight Zone" (1959)
# television series episodes:
collection <- Collection$new(collection_id = "Collection: tvseries: The Twilight Zone (1959)")
result_tbl <- get_featured_themes(collection)
result_tbl

## End(Not run)

stoRy documentation built on July 9, 2023, 7:46 p.m.