calc_rf | R Documentation |
A collection of internal helper functions that calculate various dispersion
and frequency metrics from term-document matrices. These functions support
the main calc_type_metrics
function by providing specialized calculations
for different statistical measures.
Computes the relative frequency (RF) for each term in a term-document matrix, representing how often each term occurs relative to the total corpus size.
calc_rf(tdm)
tdm |
A sparse term-document matrix (Matrix package format) |
The package implements these metrics:
Dispersion measures:
Document Frequency (DF): Count of documents containing each term
Inverse Document Frequency (IDF): Log-scaled inverse of DF, emphasizing rare terms
Deviation of Proportions (DP): Gries' measure of distributional evenness ranging from 0 (perfectly even) to 1 (completely clumped)
Frequency measures:
Relative Frequency (RF): Term frequency normalized by total corpus size
Observed Relative Frequency (ORF): RF expressed as percentage (RF * 100)
Implementation notes:
All functions expect a sparse term-document matrix input
Matrix operations are optimized using the Matrix package
NA values are handled appropriately for each metric
Results are returned as numeric vectors
The calculation process:
Sums occurrences of each term across all documents
Divides by total corpus size (sum of all terms)
Returns proportions between 0 and 1
A numeric vector where each element represents a term's relative frequency in the corpus (range: 0-1)
Gries, S. T. (2008). Dispersions and adjusted frequencies in corpora. International Journal of Corpus Linguistics, 13(4), 403-437.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.