Description Usage Arguments Details Value See Also Examples
The function freq.analysis
analyzes texts regarding frequencies of tokens,
word classes etc.
1 2 3 4 5 6 7 8 9 10 | freq.analysis(txt.file, ...)
## S4 method for signature 'kRp.text'
freq.analysis(
txt.file,
corp.freq = NULL,
desc.stat = TRUE,
corp.rm.class = "nonpunct",
corp.rm.tag = c()
)
|
txt.file |
An object of class |
... |
Additional options for the generic. |
corp.freq |
An object of class |
desc.stat |
Logical, whether an updated descriptive statistical analysis should be conducted. |
corp.rm.class |
A character vector with word classes which should be ignored for frequency analysis. The default value
|
corp.rm.tag |
A character vector with POS tags which should be ignored for frequency analysis. |
It adds new columns with frequency information to the tokens
data frame of the input data,
describing how often the particular token is used in the additionally provided corpus frequency object.
To get the results, you can use taggedText
to get the tokens
slot,
describe
to get
the raw descriptive statistics (only updated if desc.stat=TRUE
),
and corpusFreq
to get
the data from the added freq
feature.
If corp.freq
provides appropriate idf values for the types in txt.file
, the
term frequency–inverse document frequency statistic (tf-idf) will also be computed.
Missing idf values will result in NA
.
An updated object of class kRp.text
with the added feature freq
,
which is a list with information on the word frequencies of the analyzed text.
Use corpusFreq
to get that slot.
get.kRp.env
,
kRp.text
,
kRp.corp.freq
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | # code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
sample_file <- file.path(
path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
)
# call freq.analysis() on a tokenized text
tokenized.obj <- tokenize(
txt=sample_file,
lang="en"
)
# the token slot before frequency analysis
head(taggedText(tokenized.obj))
# instead of data from a larger corpus, we'll
# use the token frequencies of the text itself
tokenized.obj <- freq.analysis(
tokenized.obj,
corp.freq=read.corp.custom(tokenized.obj)
)
# compare the columns after the anylsis
head(taggedText(tokenized.obj))
# the object now has further statistics in a
# new feature slot called freq
hasFeature(tokenized.obj)
corpusFreq(tokenized.obj)
} else {}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.