calc_assoc_metrics: Calculate Association Metrics for Bigrams

View source: R/calc_assoc_metrics.R

calc_assoc_metricsR Documentation

Calculate Association Metrics for Bigrams

Description

This function calculates various association metrics (PMI, Dice's Coefficient, G-score) for bigrams in a given corpus.

Usage

calc_assoc_metrics(
  data,
  doc_index,
  token_index,
  type,
  association = "all",
  verbose = FALSE
)

Arguments

data

A data frame containing the corpus.

doc_index

Column in 'data' which represents the document index.

token_index

Column in 'data' which represents the token index.

type

Column in 'data' which represents the tokens or terms.

association

A character vector specifying which metrics to calculate. Can be any combination of 'pmi', 'dice_coeff', 'g_score', or 'all'. Default is 'all'.

verbose

A logical value indicating whether to keep the intermediate probability columns. Default is FALSE.

Value

A data frame with one row per bigram and columns for each calculated metric.

Examples

data_path <- system.file("extdata", "bigrams_data.rds", package = "qtkit")
data <- readRDS(data_path)

calc_assoc_metrics(data, doc_index, token_index, type)


qtkit documentation built on Sept. 11, 2024, 5:14 p.m.