calculate_bigram_probabilities: Calculate Probabilities for Bigrams

View source: R/calc_assoc_metrics.R

calculate_bigram_probabilitiesR Documentation

Calculate Probabilities for Bigrams

Description

Helper function that calculates joint and marginal probabilities for bigrams in the input data using dplyr. It processes the data to create bigrams and computes their probabilities along with individual token probabilities.

Usage

calculate_bigram_probabilities(data, doc_index, token_index, type)

Arguments

data

A data frame containing the corpus

doc_index

Column name for document index

token_index

Column name for token position

type

Column name for the actual tokens/terms

Value

A data frame containing:

  • x: First token in bigram

  • y: Second token in bigram

  • p_xy: Joint probability of the bigram

  • p_x: Marginal probability of first token

  • p_y: Marginal probability of second token


qtkit documentation built on April 4, 2025, 4:47 a.m.