koRpus: An R Package for Text Analysis

Share:

A set of tools to analyze texts. Includes, amongst others, functions for automatic language detection, hyphenation, several indices of lexical diversity (e.g., type token ratio, HD-D/vocd-D, MTLD) and readability (e.g., Flesch, SMOG, LIX, Dale-Chall). Basic import functions for language corpora are also provided, to enable frequency analyses (supports Celex and Leipzig Corpora Collection file formats) and measures like tf-idf. Support for additional languages can be added on-the-fly or by plugin packages. Note: For full functionality a local installation of TreeTagger is recommended. 'koRpus' also includes a plugin for the R GUI and IDE RKWard, providing graphical dialogs for its basic features. The respective R package 'rkward' cannot be installed directly from a repository, as it is a part of RKWard. To make full use of this feature, please install RKWard from https://rkward.kde.org (plugins are detected automatically). Due to some restrictions on CRAN, the full package sources are only available from the project homepage. To ask for help, report bugs, suggest feature improvements, or discuss the global development of the package, please subscribe to the koRpus-dev mailing list (https://ml06.ispgateway.de/mailman/ listinfo/korpus-dev_r.reaktanz.de).

Author
m.eik michalke [aut, cre], Earl Brown [ctb], Alberto Mirisola [ctb], Alexandre Brulet [ctb], Laura Hauser [ctb]
Date of publication
2016-06-06 06:39:53
Maintainer
m.eik michalke <meik.michalke@hhu.de>
License
GPL (>= 3)
Version
0.06-5
URLs

View on CRAN

Man pages

ARI
Readability: Automated Readability Index (ARI)
bormuth
Readability: Bormuth's Mean Cloze and Grade Placement
C.ld
Lexical diversity: Herdan's C
clozeDelete-methods
Transform text into cloze test format
coleman
Readability: Coleman's Formulas
coleman.liau
Readability: Coleman-Liau Index
correct-methods
Methods to correct koRpus objects
cTest-methods
Transform text into C-Test-like format
CTTR
Lexical diversity: Carroll's corrected TTR (CTTR)
dale.chall
Readability: Dale-Chall Readability Formula
danielson.bryan
Readability: Danielson-Bryan
dickes.steiwer
Readability: Dickes-Steiwer Handformel
DRP
Readability: Degrees of Reading Power (DRP)
ELF
Readability: Farr's Easy Listening Formula (ELF)
farr.jenkins.paterson
Readability: Farr-Jenkins-Paterson Index
flesch
Readability: Flesch Readability Ease
flesch.kincaid
Readability: Flesch-Kincaid Grade Level
FOG
Readability: Gunning FOG Index
FORCAST
Readability: FORCAST Index
freq.analysis-methods
Analyze word frequencies
fucks
Readability: Fucks' Stilcharakteristik
get.kRp.env
Get koRpus session environment
guess.lang
Guess language a text is written in
harris.jacobson
Readability: Harris-Jacobson indices
HDD
Lexical diversity: HD-D (vocd-d)
hyphen-methods
Automatic hyphenation
hyph.XX
Hyphenation patterns
jumbleWords
Produce jumbled words
K.ld
Lexical diversity: Yule's K
koRpus-package
The koRpus Package
kRp.analysis-class
S4 Class kRp.analysis
kRp.cluster
Work in (early) progress. Probably don't even look at it....
kRp.corp.freq-class
S4 Class kRp.corp.freq
kRp.filter.wclass
Remove word classes
kRp.hyphen-class
S4 Class kRp.hyphen
kRp.hyph.pat-class
S4 Class kRp.hyph.pat
kRp.lang-class
S4 Class kRp.lang
kRp.POS.tags
Get elaborated word tag definitions
kRp.readability-class
S4 Class kRp.readability
kRp.tagged-class
S4 Class kRp.tagged
kRp.taggedText-methods
Getter/setter methods for koRpus objects
kRp.text.analysis
Analyze texts using TreeTagger and word frequencies
kRp.text.paste
Paste koRpus objects
kRp.text.transform
Letter case transformation
kRp.TTR-class
S4 Class kRp.TTR
kRp.txt.freq-class
S4 Class kRp.txt.freq
kRp.txt.trans-class
S4 Class kRp.txt.trans
lex.div-methods
Analyze lexical diversity
lex.div.num
Calculate lexical diversity
linsear.write
Readability: Linsear Write Index
LIX
Readability: Bj\"ornsson's L\"asbarhetsindex (LIX)
maas
Lexical diversity: Maas' indices
manage.hyph.pat
Handling hyphenation pattern objects
MATTR
Lexical diversity: Moving-Average Type-Token Ratio (MATTR)
MSTTR
Lexical diversity: Mean Segmental Type-Token Ratio (MSTTR)
MTLD
Lexical diversity: Measure of Textual Lexical Diversity...
nWS
Readability: Neue Wiener Sachtextformeln
plot-methods
Plot method for objects of class kRp.tagged
query-methods
A method to get information out of koRpus objects
readability-methods
Measure readability
readability.num
Calculate readability
read.BAWL
Import BAWL-R data
read.corp.celex
Import Celex data
read.corp.custom-methods
Import custom corpus data
read.corp.LCC
Import LCC data
read.hyph.pat
Reading patgen-compatible hyphenation pattern files
read.tagged
Import already tagged texts
RIX
Readability: Anderson's Readability Index (RIX)
R.ld
Lexical diversity: Guiraud's R
segment.optimizer
A function to optimize MSTTR segment sizes
set.kRp.env
A function to set information on your koRpus environmenton
set.lang.support
Add support for new languages
show-methods
Show methods for koRpus objects
S.ld
Lexical diversity: Summer's S
SMOG
Readability: Simple Measure of Gobbledygook (SMOG)
spache
Readability: Spache Formula
strain
Readability: Strain Index
summary-methods
Summary methods for koRpus objects
textFeatures
Extract text features for authorship analysis
tokenize
A simple tokenizer
traenkle.bailer
Readability: Traenkle-Bailer Formeln
treetag
A function to call TreeTagger
TRI
Readability: Kuntzsch's Text-Redundanz-Index
TTR
Lexical diversity: Type-Token Ratio
tuldava
Readability: Tuldava's Text Difficulty Formula
U.ld
Lexical diversity: Uber Index (U)
wheeler.smith
Readability: Wheeler-Smith Score

Files in this package

koRpus
koRpus/TODO
koRpus/inst
koRpus/inst/CITATION
koRpus/inst/NEWS.Rd
koRpus/inst/README.languages
koRpus/inst/shiny
koRpus/inst/shiny/demo
koRpus/inst/shiny/demo/ui.R
koRpus/inst/shiny/demo/server.R
koRpus/inst/shiny/demo/maxlength.html
koRpus/inst/templates
koRpus/inst/templates/lang.support-xx.R
koRpus/inst/templates/package_koRpus.lang.xx.R
koRpus/inst/templates/hyph.xx-data.R
koRpus/inst/rkward
koRpus/inst/rkward/po
koRpus/inst/rkward/po/rkward__TokenizingPOStagging_rkward.pot
koRpus/inst/rkward/po/rkward__TokenizingPOStagging_rkward.de.po
koRpus/inst/rkward/po/de
koRpus/inst/rkward/po/de/LC_MESSAGES
koRpus/inst/rkward/po/de/LC_MESSAGES/rkward__TokenizingPOStagging_rkward.mo
koRpus/inst/rkward/koRpus.pluginmap
koRpus/inst/rkward/plugins
koRpus/inst/rkward/plugins/Readability.xml
koRpus/inst/rkward/plugins/LexicalDiversity.xml
koRpus/inst/rkward/plugins/Readability.js
koRpus/inst/rkward/plugins/Hyphenation.xml
koRpus/inst/rkward/plugins/FrequencyAnalysis.xml
koRpus/inst/rkward/plugins/TokenizingPOStagging.xml
koRpus/inst/rkward/plugins/FrequencyAnalysis.js
koRpus/inst/rkward/plugins/Hyphenation.js
koRpus/inst/rkward/plugins/LexicalDiversity.js
koRpus/inst/rkward/plugins/TokenizingPOStagging.js
koRpus/inst/rkward/rkwarddev_koRpus_plugin_script.R
koRpus/inst/doc
koRpus/inst/doc/ttr.pdf
koRpus/inst/doc/koRpus_vignette.pdf
koRpus/inst/doc/koRpus_lit.bib
koRpus/inst/doc/koRpus_vignette.Rnw
koRpus/tests
koRpus/tests/testthat.R
koRpus/tests/testthat
koRpus/tests/testthat/README_sample_text.txt
koRpus/tests/testthat/sample_text_tokenized_dput.txt
koRpus/tests/testthat/sample_text.txt
koRpus/tests/testthat/test_tokenizing_POS_tagging.R
koRpus/tests/testthat/sample_text_lexdiv_dput.txt
koRpus/tests/testthat/pseudo_word_list.txt
koRpus/tests/testthat/sample_text_hyphen_dput.txt
koRpus/tests/testthat/sample_text_readability_dput.txt
koRpus/NAMESPACE
koRpus/data
koRpus/data/hyph.de.old.rda
koRpus/data/hyph.en.rda
koRpus/data/hyph.fr.rda
koRpus/data/hyph.de.rda
koRpus/data/hyph.ru.rda
koRpus/data/hyph.es.rda
koRpus/data/hyph.it.rda
koRpus/data/hyph.en.us.rda
koRpus/R
koRpus/R/00_class_03_kRp.txt.freq.R
koRpus/R/koRpus-internal.import.R
koRpus/R/01_method_cTest.R
koRpus/R/farr.jenkins.paterson.R
koRpus/R/guess.lang.R
koRpus/R/traenkle.bailer.R
koRpus/R/kRp.POS.tags.R
koRpus/R/00_class_06_kRp.corp.freq.R
koRpus/R/maas.R
koRpus/R/kRp.text.analysis.R
koRpus/R/fucks.R
koRpus/R/01_method_show.kRp.lang.R
koRpus/R/00_class_02_kRp.TTR.R
koRpus/R/flesch.R
koRpus/R/lex.div.num.R
koRpus/R/dickes.steiwer.R
koRpus/R/01_method_summary.kRp.lang.R
koRpus/R/ARI.R
koRpus/R/S.ld.R
koRpus/R/01_method_show.kRp.readability.R
koRpus/R/00_class_09_kRp.lang.R
koRpus/R/jumbleWords.R
koRpus/R/01_method_hyphen.R
koRpus/R/01_method_show.kRp.corp.freq.R
koRpus/R/FOG.R
koRpus/R/read.tagged.R
koRpus/R/strain.R
koRpus/R/tokenize.R
koRpus/R/hyph.XX-data.R
koRpus/R/00_class_08_kRp.hyphen.R
koRpus/R/lang.support-de.R
koRpus/R/00_class_05_kRp.analysis.R
koRpus/R/00_class_10_kRp.readability.R
koRpus/R/kRp.cluster.R
koRpus/R/get.kRp.env.R
koRpus/R/01_method_kRp.taggedText.R
koRpus/R/DRP.R
koRpus/R/koRpus-internal.freq.analysis.R
koRpus/R/00_class_01_kRp.tagged.R
koRpus/R/kRp.filter.wclass.R
koRpus/R/ELF.R
koRpus/R/harris.jacobson.R
koRpus/R/koRpus-internal.R
koRpus/R/K.ld.R
koRpus/R/CTTR.R
koRpus/R/koRpus-internal.rdb.params.grades.R
koRpus/R/koRpus-package.R
koRpus/R/R.ld.R
koRpus/R/MSTTR.R
koRpus/R/linsear.write.R
koRpus/R/set.kRp.env.R
koRpus/R/koRpus-internal.hyphen.R
koRpus/R/01_method_clozeDelete.R
koRpus/R/01_method_plot.kRp.tagged.R
koRpus/R/textFeatures.R
koRpus/R/danielson.bryan.R
koRpus/R/read.corp.celex.R
koRpus/R/koRpus-internal.read.corp.custom.R
koRpus/R/01_method_correct.R
koRpus/R/01_method_lex.div.R
koRpus/R/readability.num.R
koRpus/R/01_method_summary.kRp.TTR.R
koRpus/R/treetag.R
koRpus/R/RIX.R
koRpus/R/wheeler.smith.R
koRpus/R/read.hyph.pat.R
koRpus/R/SMOG.R
koRpus/R/TTR.R
koRpus/R/read.corp.LCC.R
koRpus/R/kRp.text.paste.R
koRpus/R/MTLD.R
koRpus/R/01_method_readability.R
koRpus/R/01_method_freq.analysis.R
koRpus/R/segment.optimizer.R
koRpus/R/01_method_read.corp.custom.R
koRpus/R/spache.R
koRpus/R/coleman.liau.R
koRpus/R/U.ld.R
koRpus/R/FORCAST.R
koRpus/R/koRpus-internal.lexdiv.formulae.R
koRpus/R/00_class_04_kRp.txt.trans.R
koRpus/R/01_method_show.kRp.TTR.R
koRpus/R/01_method_summary.kRp.tagged.R
koRpus/R/dale.chall.R
koRpus/R/manage.hyph.pat.R
koRpus/R/tuldava.R
koRpus/R/LIX.R
koRpus/R/set.lang.support.R
koRpus/R/koRpus-internal.roxy.all.R
koRpus/R/lang.support-it.R
koRpus/R/lang.support-ru.R
koRpus/R/00_class_07_kRp.hyph.pat.R
koRpus/R/lang.support-fr.R
koRpus/R/01_method_summary.kRp.readability.R
koRpus/R/HDD.R
koRpus/R/nWS.R
koRpus/R/MATTR.R
koRpus/R/lang.support-en.R
koRpus/R/kRp.text.transform.R
koRpus/R/lang.support-es.R
koRpus/R/C.ld.R
koRpus/R/01_method_summary.kRp.txt.freq.R
koRpus/R/read.BAWL.R
koRpus/R/01_method_query.R
koRpus/R/flesch.kincaid.R
koRpus/R/koRpus-internal.rdb.formulae.R
koRpus/R/bormuth.R
koRpus/R/coleman.R
koRpus/R/TRI.R
koRpus/vignettes
koRpus/vignettes/ttr.pdf
koRpus/vignettes/koRpus_lit.bib
koRpus/vignettes/koRpus_vignette.Rnw
koRpus/README.md
koRpus/MD5
koRpus/DESCRIPTION
koRpus/ChangeLog
koRpus/man
koRpus/man/kRp.cluster.Rd
koRpus/man/kRp.tagged-class.Rd
koRpus/man/summary-methods.Rd
koRpus/man/lex.div-methods.Rd
koRpus/man/LIX.Rd
koRpus/man/SMOG.Rd
koRpus/man/R.ld.Rd
koRpus/man/read.corp.custom-methods.Rd
koRpus/man/plot-methods.Rd
koRpus/man/hyph.XX.Rd
koRpus/man/fucks.Rd
koRpus/man/kRp.analysis-class.Rd
koRpus/man/CTTR.Rd
koRpus/man/textFeatures.Rd
koRpus/man/set.kRp.env.Rd
koRpus/man/kRp.text.analysis.Rd
koRpus/man/DRP.Rd
koRpus/man/kRp.hyphen-class.Rd
koRpus/man/kRp.lang-class.Rd
koRpus/man/strain.Rd
koRpus/man/HDD.Rd
koRpus/man/readability-methods.Rd
koRpus/man/freq.analysis-methods.Rd
koRpus/man/S.ld.Rd
koRpus/man/MTLD.Rd
koRpus/man/ARI.Rd
koRpus/man/maas.Rd
koRpus/man/linsear.write.Rd
koRpus/man/read.corp.LCC.Rd
koRpus/man/K.ld.Rd
koRpus/man/dickes.steiwer.Rd
koRpus/man/kRp.text.transform.Rd
koRpus/man/jumbleWords.Rd
koRpus/man/read.tagged.Rd
koRpus/man/kRp.txt.trans-class.Rd
koRpus/man/correct-methods.Rd
koRpus/man/clozeDelete-methods.Rd
koRpus/man/lex.div.num.Rd
koRpus/man/treetag.Rd
koRpus/man/kRp.hyph.pat-class.Rd
koRpus/man/kRp.POS.tags.Rd
koRpus/man/farr.jenkins.paterson.Rd
koRpus/man/MSTTR.Rd
koRpus/man/harris.jacobson.Rd
koRpus/man/bormuth.Rd
koRpus/man/read.hyph.pat.Rd
koRpus/man/tuldava.Rd
koRpus/man/koRpus-package.Rd
koRpus/man/readability.num.Rd
koRpus/man/segment.optimizer.Rd
koRpus/man/kRp.filter.wclass.Rd
koRpus/man/danielson.bryan.Rd
koRpus/man/query-methods.Rd
koRpus/man/RIX.Rd
koRpus/man/C.ld.Rd
koRpus/man/TTR.Rd
koRpus/man/MATTR.Rd
koRpus/man/kRp.text.paste.Rd
koRpus/man/manage.hyph.pat.Rd
koRpus/man/read.corp.celex.Rd
koRpus/man/flesch.kincaid.Rd
koRpus/man/kRp.TTR-class.Rd
koRpus/man/FORCAST.Rd
koRpus/man/tokenize.Rd
koRpus/man/nWS.Rd
koRpus/man/U.ld.Rd
koRpus/man/flesch.Rd
koRpus/man/ELF.Rd
koRpus/man/hyphen-methods.Rd
koRpus/man/coleman.Rd
koRpus/man/guess.lang.Rd
koRpus/man/set.lang.support.Rd
koRpus/man/wheeler.smith.Rd
koRpus/man/kRp.txt.freq-class.Rd
koRpus/man/traenkle.bailer.Rd
koRpus/man/kRp.corp.freq-class.Rd
koRpus/man/dale.chall.Rd
koRpus/man/kRp.taggedText-methods.Rd
koRpus/man/coleman.liau.Rd
koRpus/man/get.kRp.env.Rd
koRpus/man/kRp.readability-class.Rd
koRpus/man/spache.Rd
koRpus/man/TRI.Rd
koRpus/man/show-methods.Rd
koRpus/man/cTest-methods.Rd
koRpus/man/FOG.Rd
koRpus/man/read.BAWL.Rd
koRpus/.Rinstignore