koRpus: An R Package for Text Analysis
Version 0.10-2

A set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats) and measures like tf-idf. Support for additional languages can be added on-the-fly or by plugin packages. Note: For full functionality a local installation of TreeTagger is recommended. 'koRpus' also includes a plugin for the R GUI and IDE RKWard, providing graphical dialogs for its basic features. The respective R package 'rkward' cannot be installed directly from a repository, as it is a part of RKWard. To make full use of this feature, please install RKWard from (plugins are detected automatically). Due to some restrictions on CRAN, the full package sources are only available from the project homepage. To ask for help, report bugs, request features, or discuss the development of the package, please subscribe to the koRpus-dev mailing list ().

Browse man pages Browse package API and functions Browse package files

Authorm.eik michalke [aut, cre], Earl Brown [ctb], Alberto Mirisola [ctb], Alexandre Brulet [ctb], Laura Hauser [ctb]
Date of publication2017-04-04 22:04:32 UTC
Maintainerm.eik michalke <meik.michalke@hhu.de>
LicenseGPL (>= 3)
Version0.10-2
URL https://reaktanz.de/?c=hacking&s=koRpus
Package repositoryView on CRAN
InstallationInstall the latest version of this package by entering the following in R:
install.packages("koRpus")

Man pages

ARI: Readability: Automated Readability Index (ARI)
bormuth: Readability: Bormuth's Mean Cloze and Grade Placement
C.ld: Lexical diversity: Herdan's C
clozeDelete-methods: Transform text into cloze test format
coleman: Readability: Coleman's Formulas
coleman.liau: Readability: Coleman-Liau Index
correct-methods: Methods to correct koRpus objects
cTest-methods: Transform text into C-Test-like format
CTTR: Lexical diversity: Carroll's corrected TTR (CTTR)
dale.chall: Readability: Dale-Chall Readability Formula
danielson.bryan: Readability: Danielson-Bryan
dickes.steiwer: Readability: Dickes-Steiwer Handformel
DRP: Readability: Degrees of Reading Power (DRP)
ELF: Readability: Fang's Easy Listening Formula (ELF)
farr.jenkins.paterson: Readability: Farr-Jenkins-Paterson Index
flesch: Readability: Flesch Readability Ease
flesch.kincaid: Readability: Flesch-Kincaid Grade Level
FOG: Readability: Gunning FOG Index
FORCAST: Readability: FORCAST Index
freq.analysis-methods: Analyze word frequencies
fucks: Readability: Fucks' Stilcharakteristik
get.kRp.env: Get koRpus session environment
guess.lang: Guess language a text is written in
harris.jacobson: Readability: Harris-Jacobson indices
HDD: Lexical diversity: HD-D (vocd-d)
hyphen-methods: Automatic hyphenation
hyph.XX: Hyphenation patterns
jumbleWords: Produce jumbled words
K.ld: Lexical diversity: Yule's K
koRpus-package: The koRpus Package
kRp.analysis-class: S4 Class kRp.analysis
kRp.cluster: Work in (early) progress. Probably don't even look at it....
kRp.corp.freq-class: S4 Class kRp.corp.freq
kRp.filter.wclass: Remove word classes
kRp.hyphen-class: S4 Class kRp.hyphen
kRp.hyph.pat-class: S4 Class kRp.hyph.pat
kRp.lang-class: S4 Class kRp.lang
kRp.POS.tags: Get elaborated word tag definitions
kRp.readability-class: S4 Class kRp.readability
kRp.tagged-class: S4 Class kRp.tagged
kRp.taggedText-methods: Getter/setter methods for koRpus objects
kRp.text.analysis: Analyze texts using TreeTagger and word frequencies
kRp.text.paste: Paste koRpus objects
kRp.text.transform: Letter case transformation
kRp.TTR-class: S4 Class kRp.TTR
kRp.txt.freq-class: S4 Class kRp.txt.freq
kRp.txt.trans-class: S4 Class kRp.txt.trans
lex.div-methods: Analyze lexical diversity
lex.div.num: Calculate lexical diversity
linsear.write: Readability: Linsear Write Index
LIX: Readability: Bj\"ornsson's L\"asbarhetsindex (LIX)
maas: Lexical diversity: Maas' indices
manage.hyph.pat: Handling hyphenation pattern objects
MATTR: Lexical diversity: Moving-Average Type-Token Ratio (MATTR)
MSTTR: Lexical diversity: Mean Segmental Type-Token Ratio (MSTTR)
MTLD: Lexical diversity: Measure of Textual Lexical Diversity...
nWS: Readability: Neue Wiener Sachtextformeln
plot-methods: Plot method for objects of class kRp.tagged
query-methods: A method to get information out of koRpus objects
readability-methods: Measure readability
readability.num: Calculate readability
read.BAWL: Import BAWL-R data
read.corp.celex: Import Celex data
read.corp.custom-methods: Import custom corpus data
read.corp.LCC: Import LCC data
read.hyph.pat: Reading patgen-compatible hyphenation pattern files
read.tagged: Import already tagged texts
RIX: Readability: Anderson's Readability Index (RIX)
R.ld: Lexical diversity: Guiraud's R
segment.optimizer: A function to optimize MSTTR segment sizes
set.kRp.env: A function to set information on your koRpus environmenton
set.lang.support: Add support for new languages
show-methods: Show methods for koRpus objects
S.ld: Lexical diversity: Summer's S
SMOG: Readability: Simple Measure of Gobbledygook (SMOG)
spache: Readability: Spache Formula
strain: Readability: Strain Index
summary-methods: Summary methods for koRpus objects
textFeatures: Extract text features for authorship analysis
tokenize: A simple tokenizer
traenkle.bailer: Readability: Traenkle-Bailer Formeln
treetag: A function to call TreeTagger
TRI: Readability: Kuntzsch's Text-Redundanz-Index
TTR: Lexical diversity: Type-Token Ratio
tuldava: Readability: Tuldava's Text Difficulty Formula
types.tokens-methods: Get types and tokens of a given text
U.ld: Lexical diversity: Uber Index (U)
wheeler.smith: Readability: Wheeler-Smith Score

Functions

ARI Man page Source code
C.ld Man page Source code
CTTR Man page Source code
DRP Man page Source code
ELF Man page Source code
FOG Man page Source code
FORCAST Man page Source code
HDD Man page Source code
K.ld Man page Source code
LIX Man page Source code
MATTR Man page Source code
MATTR.calc Source code
MSTTR Man page Source code
MSTTR.calc Source code
MTLD Man page Source code
MTLD.calc Source code
MTLDMA.calc Source code
R.ld Man page Source code
RIX Man page Source code
S.ld Man page Source code
SMOG Man page Source code
TRI Man page Source code
TTR Man page Source code
TnT Source code
U.ld Man page Source code
basic.tagged.descriptives Source code
basic.text.descriptives Source code
bormuth Man page Source code
cTest Man page
cTest,kRp.tagged-method Man page
cTestify Source code
check.file Source code
check.flavour Source code
check.hyph.cache Source code
checkLangPreset Source code
checkTTOptions Source code
clean.text Source code
clozeDelete Man page
clozeDelete,kRp.taggedText-method Man page
clozify Source code
coleman Man page Source code
coleman.liau Man page Source code
correct.hyph Man page Man page
correct.hyph,kRp.hyphen-method Man page
correct.tag Man page Man page
correct.tag,kRp.taggedText-method Man page
count.sentences Source code
create.corp.freq.object Source code
dale.chall Man page Source code
danielson.bryan Man page Source code
default.params Source code
describe Man page
describe,-methods Man page
describe,kRp.hyphen-method Man page
describe,kRp.taggedText-method Man page
describe<- Man page
describe<-,-methods Man page
describe<-,kRp.hyphen-method Man page
describe<-,kRp.taggedText-method Man page
dickes.steiwer Man page Source code
difficult.words Source code
distrib.from.fixed Source code
distrib.to.fixed Source code
dumpTextToTempfile Source code
explode.letters Source code
explode.word Source code
farr.jenkins.paterson Man page Source code
flesch Man page Source code
flesch.kincaid Man page Source code
freq.analysis Man page
freq.analysis,character-method Man page
freq.analysis,kRp.taggedText-method Man page
frqcy.by.rel Source code
frqcy.of.types Source code
frqcy.summarize Source code
fucks Man page Source code
get.grade.level Source code
get.hyph.cache Source code
get.kRp.env Man page Source code
guess.lang Man page Source code
harris.jacobson Man page Source code
hdd.calc Source code
headLine Source code
hyph.XX Man page
hyph.de Man page
hyph.de.old Man page
hyph.en Man page
hyph.en.us Man page
hyph.es Man page
hyph.fr Man page
hyph.it Man page
hyph.ru Man page
hyphen Man page
hyphen,character-method Man page
hyphen,kRp.taggedText-method Man page
hyphen.word Source code
hyphenText Man page
hyphenText,-methods Man page
hyphenText,kRp.hyphen-method Man page
hyphenText<- Man page
hyphenText<-,-methods Man page
hyphenText<-,kRp.hyphen-method Man page
import.RS Source code
import.RS.desc Source code
import.TQ Source code
import.TQ.desc Source code
is.supported.lang Source code
is.taggedText Man page Source code
jumbleWords Man page Source code
k.calc Source code
kRp.POS.tags Man page Source code
kRp.TTR,-class Man page
kRp.TTR-class Man page
kRp.analysis,-class Man page
kRp.analysis-class Man page
kRp.check.params Source code
kRp.cluster Man page Source code
kRp.corp.custom.analysis Source code
kRp.corp.custom.prepare Source code
kRp.corp.freq,-class Man page
kRp.corp.freq-class Man page
kRp.filter.wclass Man page Source code
kRp.freq.analysis.calc Source code
kRp.hyph.pat,-class Man page
kRp.hyph.pat-class Man page
kRp.hyphen,-class Man page
kRp.hyphen-class Man page
kRp.hyphen.calc Source code
kRp.idf Source code
kRp.lang,-class Man page
kRp.lang-class Man page
kRp.lex.div.formulae Source code
kRp.rdb.formulae Source code
kRp.read.corp.custom.calc Source code
kRp.readability,-class Man page
kRp.readability-class Man page
kRp.tagged,-class Man page
kRp.tagged-class Man page
kRp.text.analysis Man page Source code
kRp.text.paste Man page Source code
kRp.text.transform Man page Source code
kRp.txt.freq,-class Man page
kRp.txt.freq-class Man page
kRp.txt.trans,-class Man page
kRp.txt.trans-class Man page
koRpus-package Man page
language Man page
language,-methods Man page
language,kRp.hyphen-method Man page
language,kRp.taggedText-method Man page
language.setting Source code
language<- Man page
language<-,-methods Man page
language<-,kRp.hyphen-method Man page
language<-,kRp.taggedText-method Man page
lex.div Man page Man page
lex.div,character-method Man page
lex.div,kRp.taggedText-method Man page
lex.div,missing-method Man page
lex.div.num Man page Source code
lex.growth Source code
lgV0.calc Source code
linsear.write Man page Source code
list.add.type Source code
list.drop.type Source code
load.hyph.pattern Source code
long.words Source code
maas Man page Source code
manage.hyph.pat Man page Source code
matching.lang Source code
mtld.sub.calc Source code
mtld.sub.nodata Source code
nWS Man page Source code
noInf.summary Source code
optimize.hyph.pattern Source code
paste.tokenized.text Source code
plot Man page
plot,kRp.tagged,missing-method Man page
query Man page
query,kRp.corp.freq-method Man page
query,kRp.tagged-method Man page
queryList Source code
read.BAWL Man page Source code
read.corp.LCC Man page Source code
read.corp.celex Man page Source code
read.corp.custom Man page
read.corp.custom,character-method Man page
read.corp.custom,kRp.taggedText-method Man page
read.corp.custom,list-method Man page
read.hyph.cache.file Source code
read.hyph.pat Man page Source code
read.tagged Man page Source code
read.udhr Source code
read.word.list Source code
readability Man page
readability,character-method Man page
readability,kRp.taggedText-method Man page
readability,missing-method Man page
readability.num Man page Source code
segment.optimizer Man page Source code
set.hyph.cache Source code
set.kRp.env Man page Source code
set.lang.support Man page Source code
show,-methods Man page
show,kRp.TTR-method Man page
show,kRp.corp.freq-method Man page
show,kRp.hyphen-method Man page
show,kRp.lang-method Man page
show,kRp.readability-method Man page
show,kRp.taggedText-method Man page
spache Man page Source code
stopAndStem Source code
strain Man page Source code
summary,kRp.TTR-method Man page
summary,kRp.hyphen-method Man page
summary,kRp.lang-method Man page
summary,kRp.readability-method Man page
summary,kRp.tagged-method Man page
summary,kRp.txt.freq-method Man page
tag.kRp.txt Source code
tagged.txt.rm.classes Source code
taggedText Man page
taggedText,-methods Man page
taggedText,kRp.taggedText-method Man page
taggedText<- Man page
taggedText<-,-methods Man page
taggedText<-,kRp.taggedText-method Man page
taggz Source code
text.1st.letter Source code
text.analysis Source code
text.freq.analysis Source code
textFeatures Man page Source code
tokenize Man page Source code
tokens Man page Man page
tokens,character-method Man page
tokens,kRp.TTR-method Man page
tokens,kRp.taggedText-method Man page
tokenz Source code
traenkle.bailer Man page Source code
treetag Man page Source code
treetag.com Source code
ttr.calc Source code
ttr.calc.chars Source code
tuldava Man page Source code
txt.compress Source code
type.freq Source code
types Man page Man page
types,character-method Man page
types,kRp.TTR-method Man page
types,kRp.taggedText-method Man page
value.distribs Source code
wClassNoPunct Source code
wheeler.smith Man page Source code
word.freq Source code
write.hyph.cache.file Source code

Files

TODO
inst
inst/CITATION
inst/NEWS.Rd
inst/README.languages
inst/shiny
inst/shiny/demo
inst/shiny/demo/ui.R
inst/shiny/demo/server.R
inst/shiny/demo/maxlength.html
inst/templates
inst/templates/lang.support-xx.R
inst/templates/package_koRpus.lang.xx.R
inst/templates/hyph.xx-data.R
inst/rkward
inst/rkward/po
inst/rkward/po/rkward__TokenizingPOStagging_rkward.pot
inst/rkward/po/rkward__TokenizingPOStagging_rkward.de.po
inst/rkward/po/de
inst/rkward/po/de/LC_MESSAGES
inst/rkward/po/de/LC_MESSAGES/rkward__TokenizingPOStagging_rkward.mo
inst/rkward/koRpus.pluginmap
inst/rkward/plugins
inst/rkward/plugins/Readability.xml
inst/rkward/plugins/LexicalDiversity.xml
inst/rkward/plugins/Readability.js
inst/rkward/plugins/Hyphenation.xml
inst/rkward/plugins/FrequencyAnalysis.xml
inst/rkward/plugins/TokenizingPOStagging.xml
inst/rkward/plugins/FrequencyAnalysis.js
inst/rkward/plugins/Hyphenation.js
inst/rkward/plugins/LexicalDiversity.js
inst/rkward/plugins/TokenizingPOStagging.js
inst/rkward/rkwarddev_koRpus_plugin_script.R
inst/doc
inst/doc/ttr.pdf
inst/doc/koRpus_vignette.R
inst/doc/koRpus_vignette.pdf
inst/doc/koRpus_lit.bib
inst/doc/koRpus_vignette.Rnw
tests
tests/testthat.R
tests/testthat
tests/testthat/README_sample_text.txt
tests/testthat/sample_text_correcthyph_dput.txt
tests/testthat/sample_text_tokenized_dput.txt
tests/testthat/sample_text_TTRChar_dput.txt
tests/testthat/tokenized_single_token_dput.txt
tests/testthat/sample_text.txt
tests/testthat/test_tokenizing_POS_tagging.R
tests/testthat/sample_text_lexdiv_dput.txt
tests/testthat/pseudo_word_list.txt
tests/testthat/sample_text_hyphen_dput.txt
tests/testthat/sample_text_readability_dput.txt
NAMESPACE
data
data/hyph.de.old.rda
data/hyph.en.rda
data/hyph.fr.rda
data/hyph.de.rda
data/hyph.ru.rda
data/hyph.es.rda
data/hyph.it.rda
data/hyph.en.us.rda
R
R/koRpus-internal.import.R
R/guess.lang.R
R/kRp.POS.tags.R
R/02_method_types_tokens.R
R/kRp.text.analysis.R
R/02_method_clozeDelete.R
R/02_method_plot.kRp.tagged.R
R/02_method_summary.kRp.readability.R
R/lex.div.num.R
R/00_environment.R
R/jumbleWords.R
R/02_method_show.kRp.taggedText.R
R/wrapper_functions_lex.div.R
R/read.tagged.R
R/tokenize.R
R/hyph.XX-data.R
R/02_method_summary.kRp.TTR.R
R/02_method_show.kRp.corp.freq.R
R/02_method_summary.kRp.tagged.R
R/lang.support-de.R
R/02_method_show.kRp.hyphen.R
R/02_method_hyphen.R
R/01_class_04_kRp.txt.trans.R
R/kRp.cluster.R
R/get.kRp.env.R
R/koRpus-internal.freq.analysis.R
R/kRp.filter.wclass.R
R/koRpus-internal.R
R/01_class_01_kRp.tagged.R
R/koRpus-internal.rdb.params.grades.R
R/koRpus-package.R
R/02_method_summary.kRp.hyphen.R
R/set.kRp.env.R
R/koRpus-internal.hyphen.R
R/textFeatures.R
R/read.corp.celex.R
R/koRpus-internal.read.corp.custom.R
R/01_class_06_kRp.corp.freq.R
R/01_class_07_kRp.hyph.pat.R
R/01_class_08_kRp.hyphen.R
R/02_method_correct.R
R/readability.num.R
R/treetag.R
R/02_method_freq.analysis.R
R/01_class_02_kRp.TTR.R
R/read.hyph.pat.R
R/read.corp.LCC.R
R/kRp.text.paste.R
R/01_class_05_kRp.analysis.R
R/02_method_show.kRp.lang.R
R/segment.optimizer.R
R/02_method_cTest.R
R/koRpus-internal.lexdiv.formulae.R
R/02_method_query.R
R/02_method_read.corp.custom.R
R/manage.hyph.pat.R
R/01_class_03_kRp.txt.freq.R
R/set.lang.support.R
R/koRpus-internal.roxy.all.R
R/lang.support-it.R
R/02_method_kRp.taggedText.R
R/02_method_lex.div.R
R/lang.support-ru.R
R/02_method_summary.kRp.lang.R
R/lang.support-fr.R
R/02_method_show.kRp.TTR.R
R/01_class_09_kRp.lang.R
R/02_method_readability.R
R/02_method_show.kRp.readability.R
R/01_class_10_kRp.readability.R
R/lang.support-en.R
R/kRp.text.transform.R
R/lang.support-es.R
R/read.BAWL.R
R/koRpus-internal.rdb.formulae.R
R/02_method_summary.kRp.txt.freq.R
R/wrapper_functions_readability.R
vignettes
vignettes/ttr.pdf
vignettes/koRpus_lit.bib
vignettes/koRpus_vignette.Rnw
README.md
MD5
DESCRIPTION
ChangeLog
man
man/kRp.cluster.Rd
man/kRp.tagged-class.Rd
man/summary-methods.Rd
man/lex.div-methods.Rd
man/LIX.Rd
man/SMOG.Rd
man/R.ld.Rd
man/read.corp.custom-methods.Rd
man/plot-methods.Rd
man/hyph.XX.Rd
man/fucks.Rd
man/kRp.analysis-class.Rd
man/CTTR.Rd
man/textFeatures.Rd
man/set.kRp.env.Rd
man/kRp.text.analysis.Rd
man/DRP.Rd
man/kRp.hyphen-class.Rd
man/kRp.lang-class.Rd
man/strain.Rd
man/HDD.Rd
man/readability-methods.Rd
man/freq.analysis-methods.Rd
man/S.ld.Rd
man/MTLD.Rd
man/ARI.Rd
man/maas.Rd
man/linsear.write.Rd
man/read.corp.LCC.Rd
man/K.ld.Rd
man/dickes.steiwer.Rd
man/kRp.text.transform.Rd
man/jumbleWords.Rd
man/read.tagged.Rd
man/kRp.txt.trans-class.Rd
man/correct-methods.Rd
man/clozeDelete-methods.Rd
man/lex.div.num.Rd
man/treetag.Rd
man/kRp.hyph.pat-class.Rd
man/kRp.POS.tags.Rd
man/farr.jenkins.paterson.Rd
man/MSTTR.Rd
man/harris.jacobson.Rd
man/bormuth.Rd
man/read.hyph.pat.Rd
man/tuldava.Rd
man/koRpus-package.Rd
man/readability.num.Rd
man/segment.optimizer.Rd
man/kRp.filter.wclass.Rd
man/danielson.bryan.Rd
man/query-methods.Rd
man/RIX.Rd
man/C.ld.Rd
man/TTR.Rd
man/MATTR.Rd
man/kRp.text.paste.Rd
man/manage.hyph.pat.Rd
man/read.corp.celex.Rd
man/flesch.kincaid.Rd
man/types.tokens-methods.Rd
man/kRp.TTR-class.Rd
man/FORCAST.Rd
man/tokenize.Rd
man/nWS.Rd
man/U.ld.Rd
man/flesch.Rd
man/ELF.Rd
man/hyphen-methods.Rd
man/coleman.Rd
man/guess.lang.Rd
man/set.lang.support.Rd
man/wheeler.smith.Rd
man/kRp.txt.freq-class.Rd
man/traenkle.bailer.Rd
man/kRp.corp.freq-class.Rd
man/dale.chall.Rd
man/kRp.taggedText-methods.Rd
man/coleman.liau.Rd
man/get.kRp.env.Rd
man/kRp.readability-class.Rd
man/spache.Rd
man/TRI.Rd
man/show-methods.Rd
man/cTest-methods.Rd
man/FOG.Rd
man/read.BAWL.Rd
.Rinstignore
koRpus documentation built on May 19, 2017, 11:07 p.m.