View source: R/keyness_measure_calculator.R
| keyness_measure_calculator | R Documentation |
This function takes a dataframe with word frequencies in two corpora (target
and reference). Used on the output of the frequency_table_creator.
Supports Log Likelihood and Bayes Information Criterion as statistical
significance measures and %DIFF, Relative Risk, Log Ratio and Odds Ratio as
effect size measures of Keyness. Calculates the keyness in the target corpus
versus the reference corpus. The user can specify which measures are calculated.
The output data.frame table can be sorted by decreasing or increasing value
of any calculated measure.
keyness_measure_calculator(
df,
log_likelihood = TRUE,
ell = TRUE,
bic = TRUE,
perc_diff = TRUE,
relative_risk = TRUE,
log_ratio = TRUE,
odds_ratio = TRUE,
sort = c("none", "decreasing", "increasing"),
sort_by = c("log_likelihood", "perc_diff", "bic", "ell", "relative_risk",
"log_ratio", "odds_ratio")
)
df |
a |
log_likelihood |
logical; if |
ell |
logical; if |
bic |
logical; if |
perc_diff |
logical; if |
relative_risk |
logical; if |
log_ratio |
logical; if |
odds_ratio |
logical; if |
sort |
if |
sort_by |
specifies which measure should be used as a reference for table
sorting. If parameter sort is set to |
Log Likelihood is calculated as specified in Rayson and Garside (2000).
ELL - the Effect Size of Log Likelihood is calculated as per Johnston et al. (2006). This implementation is based on Paul Rayson's formula as per his website.
The Bayes Information Criterion is calculated as proposed in Wilson (2013), the interpretation is available in Gabrielator (2017). This implementation is based on Paul Rayson's formula as per his website.
The %DIFF measure is calculated as per Gabrielatos and Marchi (2012) and Gabrielatos (2018).
The Relative Risk (also known as the Risk Ratio) is disscussed in Kilgarriff (2009). This implementation is based on Paul Rayson's formula as per his website.
The Log Ratio measure is related to the Relative Risk measure (also known as the binary log of the ratio of relative frequencies). Implemented as proposed by Hardie as per this blog article.
The Odds Ratio calculates the occurence of the word in one corpus relative to its occurence in another corpus. This implementation is based on Paul Rayson's formula as per his website.
A dataframe with word frequencies in the target and reference corpora and the chosen keyness measures calculated for the target corpus against the reference corpus.
Gabrielatos, C. (2017) Clusters of keyness: A principled approach to selecting key items. In: Corpus Linguistics in the South.
Gabrielatos, C. (2018) Keyness Analysis: nature, metrics and techniques. In: Corpus Approaches to Discourse (pp. 225-258). Routledge.
Gabrielatos, C. and Marchi, A. (2012) Keyness: Appropriate metrics and practical issues. CADS International Conference 2012. Corpus-assisted Discourse Studies: More than the sum of Discourse Analysis and computing?, 13-14 September, University of Bologna, Italy.
Hardie, A (2014) 4. Log ratio – an informal introduction. Post on the website of the ESRC Centre for Corpus Approaches to Social Science CASS. Retrieved from: http://cass.lancs.ac.uk/?p=1133.
Johnston, J.E., Berry, K.J. and Mielke, P.W. (2006) Measures of effect size for chi-squared and likelihood-ratio goodness-of-fit tests. Perceptual and Motor Skills: Volume 103, Issue , pp. 412-414.
Kilgarriff, A. 2009. Simple maths for keywords. In: Mahlberg, M., González-Díaz, V. & Smith, C. eds. Proceedings of the Corpus Linguistics Conference, CL2009. Liverpool
Rayson, P. and Garside, R. (2000) Comparing corpora using frequency profiling. In proceedings of the workshop on Comparing Corpora, held in conjunction with the 38th annual meeting of the Association for Computational Linguistics (ACL 2000), pp. 1 - 6.
Rayson, P. (2019) Log-likelihood and effect size calculator. Website. Retrieved from: http://ucrel.lancs.ac.uk/llwizard.html.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.