koRpus: An R Package for Text Analysis

A set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats) and measures like tf-idf. Support for additional languages can be added on-the-fly or by plugin packages. Note: For full functionality a local installation of TreeTagger is recommended. 'koRpus' also includes a plugin for the R GUI and IDE RKWard, providing graphical dialogs for its basic features. The respective R package 'rkward' cannot be installed directly from a repository, as it is a part of RKWard. To make full use of this feature, please install RKWard from https://rkward.kde.org (plugins are detected automatically). Due to some restrictions on CRAN, the full package sources are only available from the project homepage. To ask for help, report bugs, suggest feature improvements, or discuss the global development of the package, please subscribe to the koRpus-dev mailing list (https://ml06.ispgateway.de/mailman/ listinfo/korpus-dev_r.reaktanz.de).

Authorm.eik michalke [aut, cre], Earl Brown [ctb], Alberto Mirisola [ctb], Alexandre Brulet [ctb], Laura Hauser [ctb]
Date of publication2016-06-06 06:39:53
Maintainerm.eik michalke <meik.michalke@hhu.de>
LicenseGPL (>= 3)
Version0.06-5
http://reaktanz.de/?c=hacking&s=koRpus

View on CRAN

Man pages

ARI: Readability: Automated Readability Index (ARI)

bormuth: Readability: Bormuth's Mean Cloze and Grade Placement

C.ld: Lexical diversity: Herdan's C

clozeDelete-methods: Transform text into cloze test format

coleman: Readability: Coleman's Formulas

coleman.liau: Readability: Coleman-Liau Index

correct-methods: Methods to correct koRpus objects

cTest-methods: Transform text into C-Test-like format

CTTR: Lexical diversity: Carroll's corrected TTR (CTTR)

dale.chall: Readability: Dale-Chall Readability Formula

danielson.bryan: Readability: Danielson-Bryan

dickes.steiwer: Readability: Dickes-Steiwer Handformel

DRP: Readability: Degrees of Reading Power (DRP)

ELF: Readability: Farr's Easy Listening Formula (ELF)

farr.jenkins.paterson: Readability: Farr-Jenkins-Paterson Index

flesch: Readability: Flesch Readability Ease

flesch.kincaid: Readability: Flesch-Kincaid Grade Level

FOG: Readability: Gunning FOG Index

FORCAST: Readability: FORCAST Index

freq.analysis-methods: Analyze word frequencies

fucks: Readability: Fucks' Stilcharakteristik

get.kRp.env: Get koRpus session environment

guess.lang: Guess language a text is written in

harris.jacobson: Readability: Harris-Jacobson indices

HDD: Lexical diversity: HD-D (vocd-d)

hyphen-methods: Automatic hyphenation

hyph.XX: Hyphenation patterns

jumbleWords: Produce jumbled words

K.ld: Lexical diversity: Yule's K

koRpus-package: The koRpus Package

kRp.analysis-class: S4 Class kRp.analysis

kRp.cluster: Work in (early) progress. Probably don't even look at it....

kRp.corp.freq-class: S4 Class kRp.corp.freq

kRp.filter.wclass: Remove word classes

kRp.hyphen-class: S4 Class kRp.hyphen

kRp.hyph.pat-class: S4 Class kRp.hyph.pat

kRp.lang-class: S4 Class kRp.lang

kRp.POS.tags: Get elaborated word tag definitions

kRp.readability-class: S4 Class kRp.readability

kRp.tagged-class: S4 Class kRp.tagged

kRp.taggedText-methods: Getter/setter methods for koRpus objects

kRp.text.analysis: Analyze texts using TreeTagger and word frequencies

kRp.text.paste: Paste koRpus objects

kRp.text.transform: Letter case transformation

kRp.TTR-class: S4 Class kRp.TTR

kRp.txt.freq-class: S4 Class kRp.txt.freq

kRp.txt.trans-class: S4 Class kRp.txt.trans

lex.div-methods: Analyze lexical diversity

lex.div.num: Calculate lexical diversity

linsear.write: Readability: Linsear Write Index

LIX: Readability: Bj\"ornsson's L\"asbarhetsindex (LIX)

maas: Lexical diversity: Maas' indices

manage.hyph.pat: Handling hyphenation pattern objects

MATTR: Lexical diversity: Moving-Average Type-Token Ratio (MATTR)

MSTTR: Lexical diversity: Mean Segmental Type-Token Ratio (MSTTR)

MTLD: Lexical diversity: Measure of Textual Lexical Diversity...

nWS: Readability: Neue Wiener Sachtextformeln

plot-methods: Plot method for objects of class kRp.tagged

query-methods: A method to get information out of koRpus objects

readability-methods: Measure readability

readability.num: Calculate readability

read.BAWL: Import BAWL-R data

read.corp.celex: Import Celex data

read.corp.custom-methods: Import custom corpus data

read.corp.LCC: Import LCC data

read.hyph.pat: Reading patgen-compatible hyphenation pattern files

read.tagged: Import already tagged texts

RIX: Readability: Anderson's Readability Index (RIX)

R.ld: Lexical diversity: Guiraud's R

segment.optimizer: A function to optimize MSTTR segment sizes

set.kRp.env: A function to set information on your koRpus environmenton

set.lang.support: Add support for new languages

show-methods: Show methods for koRpus objects

S.ld: Lexical diversity: Summer's S

SMOG: Readability: Simple Measure of Gobbledygook (SMOG)

spache: Readability: Spache Formula

strain: Readability: Strain Index

summary-methods: Summary methods for koRpus objects

textFeatures: Extract text features for authorship analysis

tokenize: A simple tokenizer

traenkle.bailer: Readability: Traenkle-Bailer Formeln

treetag: A function to call TreeTagger

TRI: Readability: Kuntzsch's Text-Redundanz-Index

TTR: Lexical diversity: Type-Token Ratio

tuldava: Readability: Tuldava's Text Difficulty Formula

U.ld: Lexical diversity: Uber Index (U)

wheeler.smith: Readability: Wheeler-Smith Score

Files in this package

koRpus
koRpus/TODO
koRpus/inst
koRpus/inst/CITATION
koRpus/inst/NEWS.Rd
koRpus/inst/README.languages
koRpus/inst/shiny
koRpus/inst/shiny/demo
koRpus/inst/shiny/demo/ui.R
koRpus/inst/shiny/demo/server.R
koRpus/inst/shiny/demo/maxlength.html
koRpus/inst/templates
koRpus/inst/templates/lang.support-xx.R
koRpus/inst/templates/package_koRpus.lang.xx.R
koRpus/inst/templates/hyph.xx-data.R
koRpus/inst/rkward
koRpus/inst/rkward/po
koRpus/inst/rkward/po/rkward__TokenizingPOStagging_rkward.pot
koRpus/inst/rkward/po/rkward__TokenizingPOStagging_rkward.de.po
koRpus/inst/rkward/po/de
koRpus/inst/rkward/po/de/LC_MESSAGES
koRpus/inst/rkward/po/de/LC_MESSAGES/rkward__TokenizingPOStagging_rkward.mo
koRpus/inst/rkward/koRpus.pluginmap
koRpus/inst/rkward/plugins
koRpus/inst/rkward/plugins/Readability.xml
koRpus/inst/rkward/plugins/LexicalDiversity.xml
koRpus/inst/rkward/plugins/Readability.js
koRpus/inst/rkward/plugins/Hyphenation.xml
koRpus/inst/rkward/plugins/FrequencyAnalysis.xml
koRpus/inst/rkward/plugins/TokenizingPOStagging.xml
koRpus/inst/rkward/plugins/FrequencyAnalysis.js
koRpus/inst/rkward/plugins/Hyphenation.js
koRpus/inst/rkward/plugins/LexicalDiversity.js
koRpus/inst/rkward/plugins/TokenizingPOStagging.js
koRpus/inst/rkward/rkwarddev_koRpus_plugin_script.R
koRpus/inst/doc
koRpus/inst/doc/ttr.pdf
koRpus/inst/doc/koRpus_vignette.pdf
koRpus/inst/doc/koRpus_lit.bib
koRpus/inst/doc/koRpus_vignette.Rnw
koRpus/tests
koRpus/tests/testthat.R
koRpus/tests/testthat
koRpus/tests/testthat/README_sample_text.txt
koRpus/tests/testthat/sample_text_tokenized_dput.txt
koRpus/tests/testthat/sample_text.txt
koRpus/tests/testthat/test_tokenizing_POS_tagging.R
koRpus/tests/testthat/sample_text_lexdiv_dput.txt
koRpus/tests/testthat/pseudo_word_list.txt
koRpus/tests/testthat/sample_text_hyphen_dput.txt
koRpus/tests/testthat/sample_text_readability_dput.txt
koRpus/NAMESPACE
koRpus/data
koRpus/data/hyph.de.old.rda
koRpus/data/hyph.en.rda
koRpus/data/hyph.fr.rda
koRpus/data/hyph.de.rda
koRpus/data/hyph.ru.rda
koRpus/data/hyph.es.rda
koRpus/data/hyph.it.rda
koRpus/data/hyph.en.us.rda
koRpus/R
koRpus/R/00_class_03_kRp.txt.freq.R koRpus/R/koRpus-internal.import.R koRpus/R/01_method_cTest.R koRpus/R/farr.jenkins.paterson.R koRpus/R/guess.lang.R koRpus/R/traenkle.bailer.R koRpus/R/kRp.POS.tags.R koRpus/R/00_class_06_kRp.corp.freq.R koRpus/R/maas.R koRpus/R/kRp.text.analysis.R koRpus/R/fucks.R koRpus/R/01_method_show.kRp.lang.R koRpus/R/00_class_02_kRp.TTR.R koRpus/R/flesch.R koRpus/R/lex.div.num.R koRpus/R/dickes.steiwer.R koRpus/R/01_method_summary.kRp.lang.R koRpus/R/ARI.R koRpus/R/S.ld.R koRpus/R/01_method_show.kRp.readability.R koRpus/R/00_class_09_kRp.lang.R koRpus/R/jumbleWords.R koRpus/R/01_method_hyphen.R koRpus/R/01_method_show.kRp.corp.freq.R koRpus/R/FOG.R koRpus/R/read.tagged.R koRpus/R/strain.R koRpus/R/tokenize.R koRpus/R/hyph.XX-data.R koRpus/R/00_class_08_kRp.hyphen.R koRpus/R/lang.support-de.R koRpus/R/00_class_05_kRp.analysis.R koRpus/R/00_class_10_kRp.readability.R koRpus/R/kRp.cluster.R koRpus/R/get.kRp.env.R koRpus/R/01_method_kRp.taggedText.R koRpus/R/DRP.R koRpus/R/koRpus-internal.freq.analysis.R koRpus/R/00_class_01_kRp.tagged.R koRpus/R/kRp.filter.wclass.R koRpus/R/ELF.R koRpus/R/harris.jacobson.R koRpus/R/koRpus-internal.R koRpus/R/K.ld.R koRpus/R/CTTR.R koRpus/R/koRpus-internal.rdb.params.grades.R koRpus/R/koRpus-package.R koRpus/R/R.ld.R koRpus/R/MSTTR.R koRpus/R/linsear.write.R koRpus/R/set.kRp.env.R koRpus/R/koRpus-internal.hyphen.R koRpus/R/01_method_clozeDelete.R koRpus/R/01_method_plot.kRp.tagged.R koRpus/R/textFeatures.R koRpus/R/danielson.bryan.R koRpus/R/read.corp.celex.R koRpus/R/koRpus-internal.read.corp.custom.R koRpus/R/01_method_correct.R koRpus/R/01_method_lex.div.R koRpus/R/readability.num.R koRpus/R/01_method_summary.kRp.TTR.R koRpus/R/treetag.R koRpus/R/RIX.R koRpus/R/wheeler.smith.R koRpus/R/read.hyph.pat.R koRpus/R/SMOG.R koRpus/R/TTR.R koRpus/R/read.corp.LCC.R koRpus/R/kRp.text.paste.R koRpus/R/MTLD.R koRpus/R/01_method_readability.R koRpus/R/01_method_freq.analysis.R koRpus/R/segment.optimizer.R koRpus/R/01_method_read.corp.custom.R koRpus/R/spache.R koRpus/R/coleman.liau.R koRpus/R/U.ld.R koRpus/R/FORCAST.R koRpus/R/koRpus-internal.lexdiv.formulae.R koRpus/R/00_class_04_kRp.txt.trans.R koRpus/R/01_method_show.kRp.TTR.R koRpus/R/01_method_summary.kRp.tagged.R koRpus/R/dale.chall.R koRpus/R/manage.hyph.pat.R koRpus/R/tuldava.R koRpus/R/LIX.R koRpus/R/set.lang.support.R koRpus/R/koRpus-internal.roxy.all.R koRpus/R/lang.support-it.R koRpus/R/lang.support-ru.R koRpus/R/00_class_07_kRp.hyph.pat.R koRpus/R/lang.support-fr.R koRpus/R/01_method_summary.kRp.readability.R koRpus/R/HDD.R koRpus/R/nWS.R koRpus/R/MATTR.R koRpus/R/lang.support-en.R koRpus/R/kRp.text.transform.R koRpus/R/lang.support-es.R koRpus/R/C.ld.R koRpus/R/01_method_summary.kRp.txt.freq.R koRpus/R/read.BAWL.R koRpus/R/01_method_query.R koRpus/R/flesch.kincaid.R koRpus/R/koRpus-internal.rdb.formulae.R koRpus/R/bormuth.R koRpus/R/coleman.R koRpus/R/TRI.R
koRpus/vignettes
koRpus/vignettes/ttr.pdf
koRpus/vignettes/koRpus_lit.bib
koRpus/vignettes/koRpus_vignette.Rnw
koRpus/README.md
koRpus/MD5
koRpus/DESCRIPTION
koRpus/ChangeLog
koRpus/man
koRpus/man/kRp.cluster.Rd koRpus/man/kRp.tagged-class.Rd koRpus/man/summary-methods.Rd koRpus/man/lex.div-methods.Rd koRpus/man/LIX.Rd koRpus/man/SMOG.Rd koRpus/man/R.ld.Rd koRpus/man/read.corp.custom-methods.Rd koRpus/man/plot-methods.Rd koRpus/man/hyph.XX.Rd koRpus/man/fucks.Rd koRpus/man/kRp.analysis-class.Rd koRpus/man/CTTR.Rd koRpus/man/textFeatures.Rd koRpus/man/set.kRp.env.Rd koRpus/man/kRp.text.analysis.Rd koRpus/man/DRP.Rd koRpus/man/kRp.hyphen-class.Rd koRpus/man/kRp.lang-class.Rd koRpus/man/strain.Rd koRpus/man/HDD.Rd koRpus/man/readability-methods.Rd koRpus/man/freq.analysis-methods.Rd koRpus/man/S.ld.Rd koRpus/man/MTLD.Rd koRpus/man/ARI.Rd koRpus/man/maas.Rd koRpus/man/linsear.write.Rd koRpus/man/read.corp.LCC.Rd koRpus/man/K.ld.Rd koRpus/man/dickes.steiwer.Rd koRpus/man/kRp.text.transform.Rd koRpus/man/jumbleWords.Rd koRpus/man/read.tagged.Rd koRpus/man/kRp.txt.trans-class.Rd koRpus/man/correct-methods.Rd koRpus/man/clozeDelete-methods.Rd koRpus/man/lex.div.num.Rd koRpus/man/treetag.Rd koRpus/man/kRp.hyph.pat-class.Rd koRpus/man/kRp.POS.tags.Rd koRpus/man/farr.jenkins.paterson.Rd koRpus/man/MSTTR.Rd koRpus/man/harris.jacobson.Rd koRpus/man/bormuth.Rd koRpus/man/read.hyph.pat.Rd koRpus/man/tuldava.Rd koRpus/man/koRpus-package.Rd koRpus/man/readability.num.Rd koRpus/man/segment.optimizer.Rd koRpus/man/kRp.filter.wclass.Rd koRpus/man/danielson.bryan.Rd koRpus/man/query-methods.Rd koRpus/man/RIX.Rd koRpus/man/C.ld.Rd koRpus/man/TTR.Rd koRpus/man/MATTR.Rd koRpus/man/kRp.text.paste.Rd koRpus/man/manage.hyph.pat.Rd koRpus/man/read.corp.celex.Rd koRpus/man/flesch.kincaid.Rd koRpus/man/kRp.TTR-class.Rd koRpus/man/FORCAST.Rd koRpus/man/tokenize.Rd koRpus/man/nWS.Rd koRpus/man/U.ld.Rd koRpus/man/flesch.Rd koRpus/man/ELF.Rd koRpus/man/hyphen-methods.Rd koRpus/man/coleman.Rd koRpus/man/guess.lang.Rd koRpus/man/set.lang.support.Rd koRpus/man/wheeler.smith.Rd koRpus/man/kRp.txt.freq-class.Rd koRpus/man/traenkle.bailer.Rd koRpus/man/kRp.corp.freq-class.Rd koRpus/man/dale.chall.Rd koRpus/man/kRp.taggedText-methods.Rd koRpus/man/coleman.liau.Rd koRpus/man/get.kRp.env.Rd koRpus/man/kRp.readability-class.Rd koRpus/man/spache.Rd koRpus/man/TRI.Rd koRpus/man/show-methods.Rd koRpus/man/cTest-methods.Rd koRpus/man/FOG.Rd koRpus/man/read.BAWL.Rd
koRpus/.Rinstignore

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.