freq.analysis-methods: Analyze word frequencies
In koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

Description Usage Arguments Details Value See Also Examples

The function freq.analysis analyzes texts regarding frequencies of tokens, word classes etc.

freq.analysis(txt.file, ...)

## S4 method for signature 'kRp.text'
freq.analysis(
  txt.file,
  corp.freq = NULL,
  desc.stat = TRUE,
  corp.rm.class = "nonpunct",
  corp.rm.tag = c()
)

`txt.file`	An object of class `kRp.text`.
`...`	Additional options for the generic.
`corp.freq`	An object of class `kRp.corp.freq`.
`desc.stat`	Logical, whether an updated descriptive statistical analysis should be conducted.
`corp.rm.class`	A character vector with word classes which should be ignored for frequency analysis. The default value `"nonpunct"` has special meaning and will cause the result of `kRp.POS.tags(lang, tags=c("punct","sentc"), list.classes=TRUE)` to be used.
`corp.rm.tag`	A character vector with POS tags which should be ignored for frequency analysis.

It adds new columns with frequency information to the tokens data frame of the input data, describing how often the particular token is used in the additionally provided corpus frequency object.

To get the results, you can use taggedText to get the tokens slot, describe to get the raw descriptive statistics (only updated if desc.stat=TRUE), and corpusFreq to get the data from the added freq feature.

If corp.freq provides appropriate idf values for the types in txt.file, the term frequency–inverse document frequency statistic (tf-idf) will also be computed. Missing idf values will result in NA.

An updated object of class kRp.text with the added feature freq, which is a list with information on the word frequencies of the analyzed text. Use corpusFreq to get that slot.

get.kRp.env, kRp.text, kRp.corp.freq

# code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
  sample_file <- file.path(
    path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
  )
  # call freq.analysis() on a tokenized text
  tokenized.obj <- tokenize(
    txt=sample_file,
    lang="en"
  )
  # the token slot before frequency analysis
  head(taggedText(tokenized.obj))

  # instead of data from a larger corpus, we'll
  # use the token frequencies of the text itself
  tokenized.obj <- freq.analysis(
    tokenized.obj,
    corp.freq=read.corp.custom(tokenized.obj)
  )
  # compare the columns after the anylsis
  head(taggedText(tokenized.obj))

  # the object now has further statistics in a
  # new feature slot called freq
  hasFeature(tokenized.obj)
  corpusFreq(tokenized.obj)
} else {}

koRpus documentation built on May 18, 2021, 1:13 a.m.

koRpus index

Package overview README.md Using the koRpus Package for Text Analysis

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

koRpus
Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

freq.analysis-methods: Analyze word frequencies
In koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to freq.analysis-methods in koRpus...

R Package Documentation

Browse R Packages

We want your feedback!

koRpus Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

freq.analysis-methods: Analyze word frequencies In koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to freq.analysis-methods in koRpus...

R Package Documentation

Browse R Packages

We want your feedback!

koRpus
Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

freq.analysis-methods: Analyze word frequencies
In koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity