kRp.corp.freq-class: S4 Class kRp.corp.freq
In koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

Description Details Slots Contructor function References

This class is used for objects that are returned by read.corp.LCC and read.corp.celex.

The slot meta simply contains all information from the "meta.txt" of the LCC[1] data and remains empty for data from a Celex[2] DB.

meta

Metadata on the corpora (see details).

words

Absolute word frequencies. It has at least the following columns:

num:: Some word ID from the DB, integer
word:: The word itself
lemma:: The lemma of the word
tag:: A part-of-speech tag
wclass:: The word class
lttr:: The number of characters
freq:: The frequency of that word in the corpus DB
pct:: Percentage of appearance in DB
pmio:: Appearance per million words in DB
log10:: Base 10 logarithm of word frequency
rank.avg:: Rank in corpus data, rank ties method "average"
rank.min:: Rank in corpus data, rank ties method "min"
rank.rel.avg:: Relative rank, i.e. percentile of "rank.avg"
rank.rel.min:: Relative rank, i.e. percentile of "rank.min"
inDocs:: The absolute number of documents in the corpus containing the word
idf:: The inverse document frequency

The slot might have additional columns, depending on the input material.

desc

Descriptive information. It contains six numbers from the meta information, for convenient accessibility:

tokens:: Number of running word forms
types:: Number of distinct word forms
words.p.sntc:: Average sentence length in words
chars.p.sntc:: Average sentence length in characters
chars.p.wform:: Average word form length
chars.p.word:: Average running word length

The slot might have additional columns, depending on the input material.

bigrams

A data.frame listing all tokens that co-occurred next to each other in the corpus:

token1:: The first token
token2:: The second token that appeared right next to the first
freq:: How often the co-occurrance was present
sig:: Log-likelihood significance of the co-occurrende

cooccur

Similar to bigrams, but listing co-occurrences anywhere in one sentence:

token1:: The first token
token2:: The second token that appeared in the same sentence
freq:: How often the co-occurrance was present
sig:: Log-likelihood significance of the co-occurrende

caseSens

A single logical value, whether the frequency statistics were calculated case sensitive or not.

Should you need to manually generate objects of this class (which should rarely be the case), the contructor function kRp_corp_freq(...) can be used instead of new("kRp.corp.freq", ...).

[1] https://wortschatz.uni-leipzig.de/en/download/ [2] http://celex.mpi.nl

koRpus documentation built on May 18, 2021, 1:13 a.m.

koRpus index

Package overview README.md Using the koRpus Package for Text Analysis

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

koRpus
Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

kRp.corp.freq-class: S4 Class kRp.corp.freq
In koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

Description

Details

Slots

Contructor function

References

Related to kRp.corp.freq-class in koRpus...

R Package Documentation

Browse R Packages

We want your feedback!

koRpus Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

kRp.corp.freq-class: S4 Class kRp.corp.freq In koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

Description

Details

Slots

Contructor function

References

Related to kRp.corp.freq-class in koRpus...

R Package Documentation

Browse R Packages

We want your feedback!

koRpus
Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity

kRp.corp.freq-class: S4 Class kRp.corp.freq
In koRpus: Text Analysis with Emphasis on POS Tagging, Readability, and Lexical Diversity