View source: R/calc_type_metrics.R
calc_type_metrics | R Documentation |
This function calculates type metrics for tokenized text data.
calc_type_metrics(data, type, document, frequency = NULL, dispersion = NULL)
data |
A data frame containing the tokenized text data |
type |
The variable in |
document |
The variable in |
frequency |
A character vector indicating which
frequency metrics to use. If NULL (default),
only the |
dispersion |
A character vector indicating which
dispersion metrics to use. If NULL (default),
only the |
A data frame with columns:
type
: The unique types from the input data.
n
: The frequency of each type across all document.
Optionally (based on the frequency
and dispersion
arguments):
rf
: The relative frequency of each type across all document.
orf
: The observed relative frequency (per 100) of each
type across all document.
df
: The document frequency of each type.
idf
: The inverse document frequency of each type.
dp
: Gries' Deviation of Proportions of each type.
Gries, Stefan Th. (2023). Statistical Methods in Corpus Linguistics. In Readings in Corpus Linguistics: A Teaching and Research Guide for Scholars in Nigeria and Beyond, pp. 78-114.
data_path <- system.file("extdata", "types_data.rds", package = "qtkit")
data <- readRDS(data_path)
calc_type_metrics(
data = data,
type = type,
document = document,
frequency = c("rf", "orf"),
dispersion = c("df", "idf")
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.