koRpus: An R Package for Text Analysis

A set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats) and measures like tf-idf. Support for additional languages can be added on-the-fly or by plugin packages. Note: For full functionality a local installation of TreeTagger is recommended. 'koRpus' also includes a plugin for the R GUI and IDE RKWard, providing graphical dialogs for its basic features. The respective R package 'rkward' cannot be installed directly from a repository, as it is a part of RKWard. To make full use of this feature, please install RKWard from https://rkward.kde.org (plugins are detected automatically). Due to some restrictions on CRAN, the full package sources are only available from the project homepage. To ask for help, report bugs, suggest feature improvements, or discuss the global development of the package, please subscribe to the koRpus-dev mailing list (https://ml06.ispgateway.de/mailman/ listinfo/korpus-dev_r.reaktanz.de).

Authorm.eik michalke [aut, cre], Earl Brown [ctb], Alberto Mirisola [ctb], Alexandre Brulet [ctb], Laura Hauser [ctb]
Date of publication2016-06-06 06:39:53
Maintainerm.eik michalke <meik.michalke@hhu.de>
LicenseGPL (>= 3)
Version0.06-5
http://reaktanz.de/?c=hacking&s=koRpus

View on CRAN

Man pages

ARI: Readability: Automated Readability Index (ARI)

bormuth: Readability: Bormuth's Mean Cloze and Grade Placement

C.ld: Lexical diversity: Herdan's C

clozeDelete-methods: Transform text into cloze test format

coleman: Readability: Coleman's Formulas

coleman.liau: Readability: Coleman-Liau Index

correct-methods: Methods to correct koRpus objects

cTest-methods: Transform text into C-Test-like format

CTTR: Lexical diversity: Carroll's corrected TTR (CTTR)

dale.chall: Readability: Dale-Chall Readability Formula

danielson.bryan: Readability: Danielson-Bryan

dickes.steiwer: Readability: Dickes-Steiwer Handformel

DRP: Readability: Degrees of Reading Power (DRP)

ELF: Readability: Farr's Easy Listening Formula (ELF)

farr.jenkins.paterson: Readability: Farr-Jenkins-Paterson Index

flesch: Readability: Flesch Readability Ease

flesch.kincaid: Readability: Flesch-Kincaid Grade Level

FOG: Readability: Gunning FOG Index

FORCAST: Readability: FORCAST Index

freq.analysis-methods: Analyze word frequencies

fucks: Readability: Fucks' Stilcharakteristik

get.kRp.env: Get koRpus session environment

guess.lang: Guess language a text is written in

harris.jacobson: Readability: Harris-Jacobson indices

HDD: Lexical diversity: HD-D (vocd-d)

hyphen-methods: Automatic hyphenation

hyph.XX: Hyphenation patterns

jumbleWords: Produce jumbled words

K.ld: Lexical diversity: Yule's K

koRpus-package: The koRpus Package

kRp.analysis-class: S4 Class kRp.analysis

kRp.cluster: Work in (early) progress. Probably don't even look at it....

kRp.corp.freq-class: S4 Class kRp.corp.freq

kRp.filter.wclass: Remove word classes

kRp.hyphen-class: S4 Class kRp.hyphen

kRp.hyph.pat-class: S4 Class kRp.hyph.pat

kRp.lang-class: S4 Class kRp.lang

kRp.POS.tags: Get elaborated word tag definitions

kRp.readability-class: S4 Class kRp.readability

kRp.tagged-class: S4 Class kRp.tagged

kRp.taggedText-methods: Getter/setter methods for koRpus objects

kRp.text.analysis: Analyze texts using TreeTagger and word frequencies

kRp.text.paste: Paste koRpus objects

kRp.text.transform: Letter case transformation

kRp.TTR-class: S4 Class kRp.TTR

kRp.txt.freq-class: S4 Class kRp.txt.freq

kRp.txt.trans-class: S4 Class kRp.txt.trans

lex.div-methods: Analyze lexical diversity

lex.div.num: Calculate lexical diversity

linsear.write: Readability: Linsear Write Index

LIX: Readability: Bj\"ornsson's L\"asbarhetsindex (LIX)

maas: Lexical diversity: Maas' indices

manage.hyph.pat: Handling hyphenation pattern objects

MATTR: Lexical diversity: Moving-Average Type-Token Ratio (MATTR)

MSTTR: Lexical diversity: Mean Segmental Type-Token Ratio (MSTTR)

MTLD: Lexical diversity: Measure of Textual Lexical Diversity...

nWS: Readability: Neue Wiener Sachtextformeln

plot-methods: Plot method for objects of class kRp.tagged

query-methods: A method to get information out of koRpus objects

readability-methods: Measure readability

readability.num: Calculate readability

read.BAWL: Import BAWL-R data

read.corp.celex: Import Celex data

read.corp.custom-methods: Import custom corpus data

read.corp.LCC: Import LCC data

read.hyph.pat: Reading patgen-compatible hyphenation pattern files

read.tagged: Import already tagged texts

RIX: Readability: Anderson's Readability Index (RIX)

R.ld: Lexical diversity: Guiraud's R

segment.optimizer: A function to optimize MSTTR segment sizes

set.kRp.env: A function to set information on your koRpus environmenton

set.lang.support: Add support for new languages

show-methods: Show methods for koRpus objects

S.ld: Lexical diversity: Summer's S

SMOG: Readability: Simple Measure of Gobbledygook (SMOG)

spache: Readability: Spache Formula

strain: Readability: Strain Index

summary-methods: Summary methods for koRpus objects

textFeatures: Extract text features for authorship analysis

tokenize: A simple tokenizer

traenkle.bailer: Readability: Traenkle-Bailer Formeln

treetag: A function to call TreeTagger

TRI: Readability: Kuntzsch's Text-Redundanz-Index

TTR: Lexical diversity: Type-Token Ratio

tuldava: Readability: Tuldava's Text Difficulty Formula

U.ld: Lexical diversity: Uber Index (U)

wheeler.smith: Readability: Wheeler-Smith Score

Functions

ARI Man page
bormuth Man page
C.ld Man page
clozeDelete Man page
clozeDelete,kRp.taggedText-method Man page
coleman Man page
coleman.liau Man page
correct.hyph Man page
correct.hyph,kRp.hyphen-method Man page
correct.tag Man page
correct.tag,kRp.taggedText-method Man page
cTest Man page
cTest,kRp.tagged-method Man page
CTTR Man page
dale.chall Man page
danielson.bryan Man page
describe Man page
describe<- Man page
describe<-,kRp.hyphen-method Man page
describe,kRp.hyphen-method Man page
describe<-,kRp.taggedText-method Man page
describe,kRp.taggedText-method Man page
describe<-,-methods Man page
describe,-methods Man page
dickes.steiwer Man page
DRP Man page
ELF Man page
farr.jenkins.paterson Man page
flesch Man page
flesch.kincaid Man page
FOG Man page
FORCAST Man page
freq.analysis Man page
freq.analysis,character-method Man page
freq.analysis,kRp.taggedText-method Man page
fucks Man page
get.kRp.env Man page
guess.lang Man page
harris.jacobson Man page
HDD Man page
hyph.de Man page
hyph.de.old Man page
hyphen Man page
hyph.en Man page
hyphen,character-method Man page
hyphen,kRp.taggedText-method Man page
hyphenText Man page
hyphenText<- Man page
hyphenText<-,kRp.hyphen-method Man page
hyphenText,kRp.hyphen-method Man page
hyphenText<-,-methods Man page
hyphenText,-methods Man page
hyph.en.us Man page
hyph.es Man page
hyph.fr Man page
hyph.it Man page
hyph.ru Man page
hyph.XX Man page
is.taggedText Man page
jumbleWords Man page
K.ld Man page
koRpus-package Man page
kRp.analysis-class Man page
kRp.analysis,-class Man page
kRp.cluster Man page
kRp.corp.freq-class Man page
kRp.corp.freq,-class Man page
kRp.filter.wclass Man page
kRp.hyphen-class Man page
kRp.hyphen,-class Man page
kRp.hyph.pat-class Man page
kRp.hyph.pat,-class Man page
kRp.lang-class Man page
kRp.lang,-class Man page
kRp.POS.tags Man page
kRp.readability-class Man page
kRp.readability,-class Man page
kRp.tagged-class Man page
kRp.tagged,-class Man page
kRp.text.analysis Man page
kRp.text.paste Man page
kRp.text.transform Man page
kRp.TTR-class Man page
kRp.TTR,-class Man page
kRp.txt.freq-class Man page
kRp.txt.freq,-class Man page
kRp.txt.trans-class Man page
kRp.txt.trans,-class Man page
language Man page
language<- Man page
language<-,kRp.hyphen-method Man page
language,kRp.hyphen-method Man page
language<-,kRp.taggedText-method Man page
language,kRp.taggedText-method Man page
language<-,-methods Man page
language,-methods Man page
lex.div Man page
lex.div,character-method Man page
lex.div,kRp.taggedText-method Man page
lex.div.num Man page
linsear.write Man page
LIX Man page
maas Man page
manage.hyph.pat Man page
MATTR Man page
MSTTR Man page
MTLD Man page
nWS Man page
plot Man page
plot,kRp.tagged,missing-method Man page
query Man page
query,kRp.corp.freq-method Man page
query,kRp.tagged-method Man page
readability Man page
readability,character-method Man page
readability,kRp.taggedText-method Man page
readability.num Man page
read.BAWL Man page
read.corp.celex Man page
read.corp.custom Man page
read.corp.custom,character-method Man page
read.corp.custom,kRp.taggedText-method Man page
read.corp.custom,list-method Man page
read.corp.LCC Man page
read.hyph.pat Man page
read.tagged Man page
RIX Man page
R.ld Man page
segment.optimizer Man page
set.kRp.env Man page
set.lang.support Man page
show Man page
show,kRp.corp.freq-method Man page
show,kRp.lang-method Man page
show,kRp.readability-method Man page
show,kRp.TTR-method Man page
show,-methods Man page
S.ld Man page
SMOG Man page
spache Man page
strain Man page
summary,kRp.lang-method Man page
summary,kRp.readability-method Man page
summary,kRp.tagged-method Man page
summary,kRp.TTR-method Man page
summary,kRp.txt.freq-method Man page
summary,-methods Man page
taggedText Man page
taggedText<- Man page
taggedText<-,kRp.taggedText-method Man page
taggedText,kRp.taggedText-method Man page
taggedText<-,-methods Man page
taggedText,-methods Man page
textFeatures Man page
tokenize Man page
traenkle.bailer Man page
treetag Man page
TRI Man page
TTR Man page
tuldava Man page
U.ld Man page
wheeler.smith Man page
WSTF Man page

Files

koRpus
koRpus/TODO
koRpus/inst
koRpus/inst/CITATION
koRpus/inst/NEWS.Rd
koRpus/inst/README.languages
koRpus/inst/shiny
koRpus/inst/shiny/demo
koRpus/inst/shiny/demo/ui.R
koRpus/inst/shiny/demo/server.R
koRpus/inst/shiny/demo/maxlength.html
koRpus/inst/templates
koRpus/inst/templates/lang.support-xx.R
koRpus/inst/templates/package_koRpus.lang.xx.R
koRpus/inst/templates/hyph.xx-data.R
koRpus/inst/rkward
koRpus/inst/rkward/po
koRpus/inst/rkward/po/rkward__TokenizingPOStagging_rkward.pot
koRpus/inst/rkward/po/rkward__TokenizingPOStagging_rkward.de.po
koRpus/inst/rkward/po/de
koRpus/inst/rkward/po/de/LC_MESSAGES
koRpus/inst/rkward/po/de/LC_MESSAGES/rkward__TokenizingPOStagging_rkward.mo
koRpus/inst/rkward/koRpus.pluginmap
koRpus/inst/rkward/plugins
koRpus/inst/rkward/plugins/Readability.xml
koRpus/inst/rkward/plugins/LexicalDiversity.xml
koRpus/inst/rkward/plugins/Readability.js
koRpus/inst/rkward/plugins/Hyphenation.xml
koRpus/inst/rkward/plugins/FrequencyAnalysis.xml
koRpus/inst/rkward/plugins/TokenizingPOStagging.xml
koRpus/inst/rkward/plugins/FrequencyAnalysis.js
koRpus/inst/rkward/plugins/Hyphenation.js
koRpus/inst/rkward/plugins/LexicalDiversity.js
koRpus/inst/rkward/plugins/TokenizingPOStagging.js
koRpus/inst/rkward/rkwarddev_koRpus_plugin_script.R
koRpus/inst/doc
koRpus/inst/doc/ttr.pdf
koRpus/inst/doc/koRpus_vignette.pdf
koRpus/inst/doc/koRpus_lit.bib
koRpus/inst/doc/koRpus_vignette.Rnw
koRpus/tests
koRpus/tests/testthat.R
koRpus/tests/testthat
koRpus/tests/testthat/README_sample_text.txt
koRpus/tests/testthat/sample_text_tokenized_dput.txt
koRpus/tests/testthat/sample_text.txt
koRpus/tests/testthat/test_tokenizing_POS_tagging.R
koRpus/tests/testthat/sample_text_lexdiv_dput.txt
koRpus/tests/testthat/pseudo_word_list.txt
koRpus/tests/testthat/sample_text_hyphen_dput.txt
koRpus/tests/testthat/sample_text_readability_dput.txt
koRpus/NAMESPACE
koRpus/data
koRpus/data/hyph.de.old.rda
koRpus/data/hyph.en.rda
koRpus/data/hyph.fr.rda
koRpus/data/hyph.de.rda
koRpus/data/hyph.ru.rda
koRpus/data/hyph.es.rda
koRpus/data/hyph.it.rda
koRpus/data/hyph.en.us.rda
koRpus/R
koRpus/R/00_class_03_kRp.txt.freq.R koRpus/R/koRpus-internal.import.R koRpus/R/01_method_cTest.R koRpus/R/farr.jenkins.paterson.R koRpus/R/guess.lang.R koRpus/R/traenkle.bailer.R koRpus/R/kRp.POS.tags.R koRpus/R/00_class_06_kRp.corp.freq.R koRpus/R/maas.R koRpus/R/kRp.text.analysis.R koRpus/R/fucks.R koRpus/R/01_method_show.kRp.lang.R koRpus/R/00_class_02_kRp.TTR.R koRpus/R/flesch.R koRpus/R/lex.div.num.R koRpus/R/dickes.steiwer.R koRpus/R/01_method_summary.kRp.lang.R koRpus/R/ARI.R koRpus/R/S.ld.R koRpus/R/01_method_show.kRp.readability.R koRpus/R/00_class_09_kRp.lang.R koRpus/R/jumbleWords.R koRpus/R/01_method_hyphen.R koRpus/R/01_method_show.kRp.corp.freq.R koRpus/R/FOG.R koRpus/R/read.tagged.R koRpus/R/strain.R koRpus/R/tokenize.R koRpus/R/hyph.XX-data.R koRpus/R/00_class_08_kRp.hyphen.R koRpus/R/lang.support-de.R koRpus/R/00_class_05_kRp.analysis.R koRpus/R/00_class_10_kRp.readability.R koRpus/R/kRp.cluster.R koRpus/R/get.kRp.env.R koRpus/R/01_method_kRp.taggedText.R koRpus/R/DRP.R koRpus/R/koRpus-internal.freq.analysis.R koRpus/R/00_class_01_kRp.tagged.R koRpus/R/kRp.filter.wclass.R koRpus/R/ELF.R koRpus/R/harris.jacobson.R koRpus/R/koRpus-internal.R koRpus/R/K.ld.R koRpus/R/CTTR.R koRpus/R/koRpus-internal.rdb.params.grades.R koRpus/R/koRpus-package.R koRpus/R/R.ld.R koRpus/R/MSTTR.R koRpus/R/linsear.write.R koRpus/R/set.kRp.env.R koRpus/R/koRpus-internal.hyphen.R koRpus/R/01_method_clozeDelete.R koRpus/R/01_method_plot.kRp.tagged.R koRpus/R/textFeatures.R koRpus/R/danielson.bryan.R koRpus/R/read.corp.celex.R koRpus/R/koRpus-internal.read.corp.custom.R koRpus/R/01_method_correct.R koRpus/R/01_method_lex.div.R koRpus/R/readability.num.R koRpus/R/01_method_summary.kRp.TTR.R koRpus/R/treetag.R koRpus/R/RIX.R koRpus/R/wheeler.smith.R koRpus/R/read.hyph.pat.R koRpus/R/SMOG.R koRpus/R/TTR.R koRpus/R/read.corp.LCC.R koRpus/R/kRp.text.paste.R koRpus/R/MTLD.R koRpus/R/01_method_readability.R koRpus/R/01_method_freq.analysis.R koRpus/R/segment.optimizer.R koRpus/R/01_method_read.corp.custom.R koRpus/R/spache.R koRpus/R/coleman.liau.R koRpus/R/U.ld.R koRpus/R/FORCAST.R koRpus/R/koRpus-internal.lexdiv.formulae.R koRpus/R/00_class_04_kRp.txt.trans.R koRpus/R/01_method_show.kRp.TTR.R koRpus/R/01_method_summary.kRp.tagged.R koRpus/R/dale.chall.R koRpus/R/manage.hyph.pat.R koRpus/R/tuldava.R koRpus/R/LIX.R koRpus/R/set.lang.support.R koRpus/R/koRpus-internal.roxy.all.R koRpus/R/lang.support-it.R koRpus/R/lang.support-ru.R koRpus/R/00_class_07_kRp.hyph.pat.R koRpus/R/lang.support-fr.R koRpus/R/01_method_summary.kRp.readability.R koRpus/R/HDD.R koRpus/R/nWS.R koRpus/R/MATTR.R koRpus/R/lang.support-en.R koRpus/R/kRp.text.transform.R koRpus/R/lang.support-es.R koRpus/R/C.ld.R koRpus/R/01_method_summary.kRp.txt.freq.R koRpus/R/read.BAWL.R koRpus/R/01_method_query.R koRpus/R/flesch.kincaid.R koRpus/R/koRpus-internal.rdb.formulae.R koRpus/R/bormuth.R koRpus/R/coleman.R koRpus/R/TRI.R
koRpus/vignettes
koRpus/vignettes/ttr.pdf
koRpus/vignettes/koRpus_lit.bib
koRpus/vignettes/koRpus_vignette.Rnw
koRpus/README.md
koRpus/MD5
koRpus/DESCRIPTION
koRpus/ChangeLog
koRpus/man
koRpus/man/kRp.cluster.Rd koRpus/man/kRp.tagged-class.Rd koRpus/man/summary-methods.Rd koRpus/man/lex.div-methods.Rd koRpus/man/LIX.Rd koRpus/man/SMOG.Rd koRpus/man/R.ld.Rd koRpus/man/read.corp.custom-methods.Rd koRpus/man/plot-methods.Rd koRpus/man/hyph.XX.Rd koRpus/man/fucks.Rd koRpus/man/kRp.analysis-class.Rd koRpus/man/CTTR.Rd koRpus/man/textFeatures.Rd koRpus/man/set.kRp.env.Rd koRpus/man/kRp.text.analysis.Rd koRpus/man/DRP.Rd koRpus/man/kRp.hyphen-class.Rd koRpus/man/kRp.lang-class.Rd koRpus/man/strain.Rd koRpus/man/HDD.Rd koRpus/man/readability-methods.Rd koRpus/man/freq.analysis-methods.Rd koRpus/man/S.ld.Rd koRpus/man/MTLD.Rd koRpus/man/ARI.Rd koRpus/man/maas.Rd koRpus/man/linsear.write.Rd koRpus/man/read.corp.LCC.Rd koRpus/man/K.ld.Rd koRpus/man/dickes.steiwer.Rd koRpus/man/kRp.text.transform.Rd koRpus/man/jumbleWords.Rd koRpus/man/read.tagged.Rd koRpus/man/kRp.txt.trans-class.Rd koRpus/man/correct-methods.Rd koRpus/man/clozeDelete-methods.Rd koRpus/man/lex.div.num.Rd koRpus/man/treetag.Rd koRpus/man/kRp.hyph.pat-class.Rd koRpus/man/kRp.POS.tags.Rd koRpus/man/farr.jenkins.paterson.Rd koRpus/man/MSTTR.Rd koRpus/man/harris.jacobson.Rd koRpus/man/bormuth.Rd koRpus/man/read.hyph.pat.Rd koRpus/man/tuldava.Rd koRpus/man/koRpus-package.Rd koRpus/man/readability.num.Rd koRpus/man/segment.optimizer.Rd koRpus/man/kRp.filter.wclass.Rd koRpus/man/danielson.bryan.Rd koRpus/man/query-methods.Rd koRpus/man/RIX.Rd koRpus/man/C.ld.Rd koRpus/man/TTR.Rd koRpus/man/MATTR.Rd koRpus/man/kRp.text.paste.Rd koRpus/man/manage.hyph.pat.Rd koRpus/man/read.corp.celex.Rd koRpus/man/flesch.kincaid.Rd koRpus/man/kRp.TTR-class.Rd koRpus/man/FORCAST.Rd koRpus/man/tokenize.Rd koRpus/man/nWS.Rd koRpus/man/U.ld.Rd koRpus/man/flesch.Rd koRpus/man/ELF.Rd koRpus/man/hyphen-methods.Rd koRpus/man/coleman.Rd koRpus/man/guess.lang.Rd koRpus/man/set.lang.support.Rd koRpus/man/wheeler.smith.Rd koRpus/man/kRp.txt.freq-class.Rd koRpus/man/traenkle.bailer.Rd koRpus/man/kRp.corp.freq-class.Rd koRpus/man/dale.chall.Rd koRpus/man/kRp.taggedText-methods.Rd koRpus/man/coleman.liau.Rd koRpus/man/get.kRp.env.Rd koRpus/man/kRp.readability-class.Rd koRpus/man/spache.Rd koRpus/man/TRI.Rd koRpus/man/show-methods.Rd koRpus/man/cTest-methods.Rd koRpus/man/FOG.Rd koRpus/man/read.BAWL.Rd
koRpus/.Rinstignore

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.