Description Usage Arguments Value Examples
Given a frequency table (with texts as rows and words as columns),
this function calculates log-likelihood and log ratio of one set of rows against the other rows.
The return value is a list containing scores for each word. If the method
is loglikelihood
, the returned scores are unsigned G2 values. To estimate the
direction of the keyness, the log ratio
is more informative. A nice introduction
into log ratio can be found here.
1 2 3 4 5 6 7 8 |
ft |
The frequency table |
categories |
A factor or numeric vector that represents an assignment of categories. |
epsilon |
null values are replaced by this value, in order to avoid division by zero |
siglevel |
Return only the keywords above the significance level. Set to 1 to get all words |
method |
Either "logratio" or "loglikelihood" (default) |
minimalFrequency |
Words less frequent than this value are not considered at all |
A list of keywords, sorted by their log-likelihood or log ratio value, calculated according to http://ucrel.lancs.ac.uk/llwizard.html.
1 2 3 4 5 6 7 8 9 | data("rksp.0")
ft <- frequencytable(rksp.0, byCharacter = TRUE, normalize = FALSE)
# Calculate log ratio for all words
genders <- factor(c("m", "m", "m", "m", "f", "m", "m", "m", "f", "m", "m", "f", "m"))
keywords <- keyness(ft, method = "logratio",
categories = genders,
minimalFrequency = 5)
# Remove words that are not significantly different
keywords <- keywords[names(keywords) %in% names(keyness(ft, siglevel = 0.01))]
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.