Home

/

CRAN

/

highlightr

/

collocation_frequency: Mapping Collocation Frequency to Source Document

collocation_frequency: Mapping Collocation Frequency to Source Document
In highlightr: Highlight Conserved Edits Across Versions of a Document

View source: R/collocation_frequency.R

collocation_frequency

R Documentation

Mapping Collocation Frequency to Source Document

Description

This function provides the frequency of collocations in comments that correspond to the provided source document.

Usage

collocation_frequency(
  tbl,
  source_row,
  text_column,
  collocate_length = 5,
  fuzzy = FALSE,
  n_bands = 50,
  threshold = 0.7,
  n_gram_width = 4,
  band_width = 8
)

Arguments

`tbl`	data frame containing documents, where each row represents a document
`source_row`	row containing text to be treated as source
`text_column`	string indicating the name of the column containing derivative text
`collocate_length`	the length of the collocation. Default is 5
`fuzzy`	whether or not to use fuzzy matching in collocation calculations
`n_bands`	number of bands used in MinHash algorithm passed to `zoomerjoin::jaccard_right_join()`. Default is 50
`threshold`	Jaccard distance threshold to be considered a match passed to `zoomerjoin::jaccard_right_join()`. Default is 0.7
`n_gram_width`	width of n-grams used in Jaccard distance calculation passed to `zoomerjoin::jaccard_right_join()`. Default is 4
`band_width`	width of band used in MinHash algorithm passed to `zoomerjoin::jaccard_right_join()`. Default is 8

Details

Collocations are sequences of words present in the source document. For example, the phrase "the blue bird flies" contains one collocation of length 4 ("the blue bird flies"), two collocations of length 3 ("the blue bird" and "blue bird flies"), and three collocations of length 2 ("the blue", "blue bird", and "bird flies"). This function counts the number of corresponding phrases in the 'notes', or the derivative documents. This count is divided by the number of times the phrase occurs in the source document. When fuzzy matching is included, indirect matches are included with a weight of (n*d)/m, where n is the frequency of the fuzzy collocation, d is the Jaccard similarity between the transcript and note collocation, and m is the number of closest matches for the note collocation.

Value

a dataframe of the transcript document with collocation values by word

Examples

src_row <- which(notepad_example$ID=="source")
merged_frequency <- collocation_frequency(notepad_example, src_row, "Text")

highlightr documentation built on April 11, 2026, 1:06 a.m.

highlightr index

README.md highlightr Wikipedia Highlighter Article

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

highlightr
Highlight Conserved Edits Across Versions of a Document

collocation_frequency: Mapping Collocation Frequency to Source Document
In highlightr: Highlight Conserved Edits Across Versions of a Document

Mapping Collocation Frequency to Source Document

Description

Usage

Arguments

Details

Value

Examples

Related to collocation_frequency in highlightr...

R Package Documentation

Browse R Packages

We want your feedback!

highlightr Highlight Conserved Edits Across Versions of a Document

collocation_frequency: Mapping Collocation Frequency to Source Document In highlightr: Highlight Conserved Edits Across Versions of a Document

Mapping Collocation Frequency to Source Document

Description

Usage

Arguments

Details

Value

Examples

Related to collocation_frequency in highlightr...

R Package Documentation

Browse R Packages

We want your feedback!

highlightr
Highlight Conserved Edits Across Versions of a Document

collocation_frequency: Mapping Collocation Frequency to Source Document
In highlightr: Highlight Conserved Edits Across Versions of a Document