calculate_ngram_is: Calculate IS index for n-grams
In tall: Text Analysis for All

calculate_ngram_is

R Documentation

Calculate IS index for n-grams

Description

This function calculates the IS (Absorption Index) from Morrone (1996) for all n-grams in the corpus. Only n-grams that start AND end with lexical words are considered.

Usage

calculate_ngram_is(
  dfTag,
  max_ngram = 5,
  term = "lemma",
  pos = c("NOUN", "ADJ", "ADV", "VERB"),
  min_freq = 1,
  min_IS_norm = 0
)

Arguments

`dfTag`	A data frame with tagged text data containing columns: doc_id, sentence_id, token_id, lemma/token, upos
`max_ngram`	Maximum length of n-grams to generate (default: 5)
`term`	Character string indicating which column to use: "lemma" or "token" (default: "lemma")
`pos`	Character vector of POS tags considered lexical (default: c("NOUN", "ADJ", "ADV", "VERB"))
`min_freq`	Minimum frequency threshold for n-grams (default: 1)
`min_IS_norm`	Minimum normalized IS threshold for n-grams (default: 0)

Details

The IS index is calculated as: IS = (sum 1/freq_i) × freq_ngram × n_lexical where freq_i is the frequency of each word in the n-gram, freq_ngram is the frequency of the n-gram, and n_lexical is the number of lexical words. IS_norm is the normalized version: IS / L^2 where L is the n-gram length.

OPTIMIZATION: Only n-grams that start AND end with lexical words (as defined by the 'pos' parameter) are generated, significantly reducing computation time.

Value

A tibble with columns: ngram, n_length, ngram_freq, n_lexical, IS, IS_norm

Examples

## Not run: 
IS <- calculate_ngram_is(dfTag, max_ngram = 4, term = "lemma", min_freq = 2)
head(IS)

## End(Not run)

tall documentation built on Feb. 12, 2026, 9:08 a.m.

tall index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

tall
Text Analysis for All

calculate_ngram_is: Calculate IS index for n-grams
In tall: Text Analysis for All

Calculate IS index for n-grams

Description

Usage

Arguments

Details

Value

Examples

Related to calculate_ngram_is in tall...

R Package Documentation

Browse R Packages

We want your feedback!

tall Text Analysis for All

calculate_ngram_is: Calculate IS index for n-grams In tall: Text Analysis for All

Calculate IS index for n-grams

Description

Usage

Arguments

Details

Value

Examples

Related to calculate_ngram_is in tall...

R Package Documentation

Browse R Packages

We want your feedback!

tall
Text Analysis for All

calculate_ngram_is: Calculate IS index for n-grams
In tall: Text Analysis for All