# readability-methods: Measure readability In koRpus: An R Package for Text Analysis

## Description

These methods calculate several readability indices.

## Usage

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 readability(txt.file, ...) ## S4 method for signature 'kRp.taggedText' readability(txt.file, hyphen = NULL, index = c("ARI", "Bormuth", "Coleman", "Coleman.Liau", "Dale.Chall", "Danielson.Bryan", "Dickes.Steiwer", "DRP", "ELF", "Farr.Jenkins.Paterson", "Flesch", "Flesch.Kincaid", "FOG", "FORCAST", "Fucks", "Harris.Jacobson", "Linsear.Write", "LIX", "nWS", "RIX", "SMOG", "Spache", "Strain", "Traenkle.Bailer", "TRI", "Tuldava", "Wheeler.Smith"), parameters = list(), word.lists = list(Bormuth = NULL, Dale.Chall = NULL, Harris.Jacobson = NULL, Spache = NULL), fileEncoding = "UTF-8", tagger = "kRp.env", force.lang = NULL, sentc.tag = "sentc", nonword.class = "nonpunct", nonword.tag = c(), quiet = FALSE, ...) ## S4 method for signature 'character' readability(txt.file, hyphen = NULL, index = c("ARI", "Bormuth", "Coleman", "Coleman.Liau", "Dale.Chall", "Danielson.Bryan", "Dickes.Steiwer", "DRP", "ELF", "Farr.Jenkins.Paterson", "Flesch", "Flesch.Kincaid", "FOG", "FORCAST", "Fucks", "Harris.Jacobson", "Linsear.Write", "LIX", "nWS", "RIX", "SMOG", "Spache", "Strain", "Traenkle.Bailer", "TRI", "Tuldava", "Wheeler.Smith"), parameters = list(), word.lists = list(Bormuth = NULL, Dale.Chall = NULL, Harris.Jacobson = NULL, Spache = NULL), fileEncoding = "UTF-8", tagger = "kRp.env", force.lang = NULL, sentc.tag = "sentc", nonword.class = "nonpunct", nonword.tag = c(), quiet = FALSE, ...) ## S4 method for signature 'missing' readability(txt.file, index) 

## Arguments

 txt.file Either an object of class kRp.tagged-class, kRp.txt.freq-class, kRp.analysis-class or kRp.txt.trans-class, or a character vector which must be be a valid path to a file containing the text to be analyzed. If the latter, force.lang must be set as well, and the language specified must be supported by both treetag and hyphen ... Additional options for the specified tagger function hyphen An object of class kRp.hyphen. If NULL, the text will be hyphenated automatically. All syllable handling will be skipped automatically if it's not needed for the selected indices. index A character vector, indicating which indices should actually be computed. If set to "all", then all available indices will be tried (meaning all variations of all measures). If set to "fast", a subset of the default values is used that is known to compute fast (currently, this only excludes "FOG"). You can also set it to "validation" to get information on the current status of validation. parameters A list with named magic numbers, defining the relevant parameters for each index. If none are given, the default values are used. word.lists A named list providing the word lists for indices which need one. If NULL or missing, the indices will be skipped and a warning is giving. Actual word lists can be provided as either a vector (or matrix or data.frame with only one column), or as a file name, where this file must contain one word per line. Alternatively, you can provide the number of words which are not on the list, directly. fileEncoding A character string naming the encoding of the word list files (if they are files). "ISO_8859-1" or "UTF-8" should work in most cases. tagger A character string pointing to the tokenizer/tagger command you want to use for basic text analysis. Can be omitted if txt.file is already of class kRp.tagged-class. Defaults to tagger="kRp.env" to get the settings by get.kRp.env. Set to "tokenize" to use tokenize. force.lang A character string defining the language to be assumed for the text, by force. sentc.tag A character vector with POS tags which indicate a sentence ending. The default value "sentc" has special meaning and will cause the result of kRp.POS.tags(lang, tags="sentc", list.tags=TRUE) to be used. nonword.class A character vector with word classes which should be ignored for readability analysis. The default value "nonpunct" has special meaning and will cause the result of kRp.POS.tags(lang, c("punct","sentc"), list.classes=TRUE) to be used. Will only be of consequence if hyphen is not set! nonword.tag A character vector with POS tags which should be ignored for readability analysis. Will only be of consequence if hyphen is not set! quiet Logical. If FALSE, short status messages will be shown. TRUE will also suppress all potential warnings regarding the validation status of measures.

## Details

In the following formulae, W stands for the number of words, St for the number of sentences, C for the number of characters (usually meaning letters), Sy for the number of syllables, W_{3Sy} for the number of words with at least three syllables, W_{<3Sy} for the number of words with less than three syllables, W^{1Sy} for words with exactly one syllable, W_{6C} for the number of words with at least six letters, and W_{-WL} for the number of words which are not on a certain word list (explained where needed).

"ARI":

ARI = 0.5 \times \frac{W}{St} + 4.71 \times \frac{C}{W} - 21.43

If parameters is set to ARI="NRI", the revised parameters from the Navy Readability Indexes are used:

ARI_{NRI} = 0.4 \times \frac{W}{St} + 6 \times \frac{C}{W} - 27.4

If parameters is set to ARI="simple", the simplified formula is calculated:

ARI_{simple} = \frac{W}{St} + 9 \times \frac{C}{W}

Wrapper function: ARI

"Bormuth":

Bormuth Mean Cloze & Grade Placement:

B_{MC} = 0.886593 - ≤ft( 0.08364 \times \frac{C}{W} \right) + 0.161911 \times ≤ft(\frac{W_{-WL}}{W} \right)^3

- 0.21401 \times ≤ft(\frac{W}{St} \right) + 0.000577 \times ≤ft(\frac{W}{St} \right)^2

- 0.000005 \times ≤ft(\frac{W}{St} \right)^3

Note: This index needs the long Dale-Chall list of 3000 familiar (english) words to compute W_{-WL}. That is, you must have a copy of this word list and provide it via the word.lists=list(Bormuth=<your.list>) parameter!

B_{GP} = 4.275 + 12.881 \times B_{MC} - (34.934 \times B_{MC}^2) + (20.388 \times B_{MC}^3)

+ (26.194C - 2.046 C_{CS}^2) - (11.767 C_{CS}^3) - (44.285 \times B_{MC} \times C_{CS})

+ (97.620 \times (B_{MC} \times C_{CS})^2) - (59.538 \times (B_{MC} \times C_{CS})^3)

Where C_{CS} represents the cloze criterion score (35% by default).

Wrapper function: bormuth

"Coleman":

C_1 = 1.29 \times ≤ft( \frac{100 \times W^{1Sy}}{W} \right) - 38.45

C_2 = 1.16 \times ≤ft( \frac{100 \times W^{1Sy}}{W} \right) + 1.48 \times ≤ft( \frac{100 \times St}{W} \right) - 37.95

C_3 = 1.07 \times ≤ft( \frac{100 \times W^{1Sy}}{W} \right) + 1.18 \times ≤ft( \frac{100 \times St}{W} \right) + 0.76 \times ≤ft( \frac{100 \times W_{pron}}{W} \right) - 34.02

C_4 = 1.04 \times ≤ft( \frac{100 \times W^{1Sy}}{W} \right) + 1.06 \times ≤ft( \frac{100 \times St}{W} \right) \\ + 0.56 \times ≤ft( \frac{100 \times W_{pron}}{W} \right) - 0.36 \times ≤ft( \frac{100 \times W_{prep}}{W} \right) - 26.01

Where W_{pron} is the number of pronouns, and W_{prep} the number of prepositions.

Wrapper function: coleman

"Coleman.Liau":

First estimates cloze percentage, then calculates grade equivalent:

CL_{ECP} = 141.8401 - 0.214590 \times \frac{100 \times C}{W} + 1.079812 \times \frac{100 \times St}{W}

CL_{grade} = -27.4004 \times \frac{CL_{ECP}}{100} + 23.06395

The short form is also calculated:

CL_{short} = 5.88 \times \frac{C}{W} - 29.6 \times \frac{St}{W} - 15.8

Wrapper function: coleman.liau

"Dale.Chall":

New Dale-Chall Readability Formula. By default the revised formula (1995) is calculated:

DC_{new} = 64 - 0.95 \times{} \frac{100 \times{} W_{-WL}}{W} - 0.69 \times{} \frac{W}{St}

This will result in a cloze score which is then looked up in a grading table. If parameters is set to Dale.Chall="old", the original formula (1948) is used:

DC_{old} = 0.1579 \times{} \frac{100 \times{} W_{-WL}}{W} + 0.0496 \times{} \frac{W}{St} + 3.6365

If parameters is set to Dale.Chall="PSK", the revised parameters by Powers-Sumner-Kearl (1958) are used:

DC_{PSK} = 0.1155 \times{} \frac{100 \times{} W_{-WL}}{W} + 0.0596 \times{} \frac{W}{St} + 3.2672

Note: This index needs the long Dale-Chall list of 3000 familiar (english) words to compute W_{-WL}. That is, you must have a copy of this word list and provide it via the word.lists=list(Dale.Chall=<your.list>) parameter!

Wrapper function: dale.chall

"Danielson.Bryan":

DB_1 = ≤ft( 1.0364 \times \frac{C}{Bl} \right) + ≤ft( 0.0194 \times \frac{C}{St} \right) - 0.6059

DB_2 = 131.059 - ≤ft( 10.364 \times \frac{C}{Bl} \right) - ≤ft( 0.194 \times \frac{C}{St} \right)

Where Bl means blanks between words, which is not really counted in this implementation, but estimated by words - 1. C is interpreted as literally all characters.

Wrapper function: danielson.bryan

"Dickes.Steiwer":

Dickes-Steiwer Handformel:

DS = 235.95993 - ≤ft( 73.021 \times \frac{C}{W} \right) - ≤ft(12.56438 \times \frac{W}{St} \right) - ≤ft(50.03293 \times TTR \right)

Where TTR refers to the type-token ratio, which will be calculated case-insensitive by default.

Wrapper function: dickes.steiwer

"DRP":

Degrees of Reading Power. Uses the Bormuth Mean Cloze Score:

DRP = (1 - B_{MC}) \times 100

This formula itself has no parameters. Note: The Bormuth index needs the long Dale-Chall list of 3000 familiar (english) words to compute W_{-WL}. That is, you must have a copy of this word list and provide it via the word.lists=list(Bormuth=<your.list>) parameter! Wrapper function: DRP

"ELF":

Fang's Easy Listening Formula:

ELF = \frac{W_{2Sy}}{St}

Wrapper function: ELF

"Farr.Jenkins.Paterson":

A simplified version of Flesch Reading Ease:

-31.517 - 1.015 \times \frac{W}{St} + 1.599 \times \frac{{W^{1Sy}}}{W}

If parameters is set to Farr.Jenkins.Paterson="PSK", the revised parameters by Powers-Sumner-Kearl (1958) are used:

8.4335 + 0.0923 \times \frac{W}{St} - 0.0648 \times \frac{{W^{1Sy}}}{W}

Wrapper function: farr.jenkins.paterson

"Flesch":

206.835 - 1.015 \times \frac{W}{St} - 84.6 \times \frac{Sy}{W}

Certain internationalisations of the parameters are also implemented. They can be used by setting the Flesch parameter to one of the following language abbreviations.

"de" (Amstad's Verst<c3><a4>ndlichkeitsindex):

180 - \frac{W}{St} - 58.5 \times \frac{Sy}{W}

"es" (Fernandez-Huerta):

206.835 - 1.02 \times \frac{W}{St} - 60 \times \frac{Sy}{W}

"es-s" (Szigriszt):

206.835 - \frac{W}{St} - 62.3 \times \frac{Sy}{W}

"nl" (Douma):

206.835 - 0.93 \times \frac{W}{St} - 77 \times \frac{Sy}{W}

"nl-b" (Brouwer Leesindex):

195 - 2 \times \frac{W}{St} - 67 \times \frac{Sy}{W}

"fr" (Kandel-Moles):

209 - 1.15 \times \frac{W}{St} - 68 \times \frac{Sy}{W}

If parameters is set to Flesch="PSK", the revised parameters by Powers-Sumner-Kearl (1958) are used to calculate a grade level:

Flesch_{PSK} = 0.0778 \times \frac{W}{St} + 4.55 \times \frac{Sy}{W} - 2.2029

Wrapper function: flesch

"Flesch.Kincaid":

0.39 \times \frac{W}{St} + 11.8 \times \frac{Sy}{W} - 15.59

Wrapper function: flesch.kincaid

"FOG":

Gunning Frequency of Gobbledygook:

FOG = 0.4 \times ≤ft( \frac{W}{St} + \frac{100 \times W_{3Sy}}{W} \right)

If parameters is set to FOG="PSK", the revised parameters by Powers-Sumner-Kearl (1958) are used:

FOG_{PSK} = 3.0680 + ≤ft( 0.0877 \times \frac{W}{St} \right) + ≤ft(0.0984 \times \frac{100 \times W_{3Sy}}{W} \right)

If parameters is set to FOG="NRI", the new FOG count from the Navy Readability Indexes is used:

FOG_{new} = \frac{\frac{W_{<3Sy} + (3 * W_{3Sy})}{\frac{100 \times St}{W}} - 3}{2}

If the text was POS-tagged accordingly, proper nouns and combinations of only easy words will not be counted as hard words, and the syllables of verbs ending in "-ed", "-es" or "-ing" will be counted without these suffixes.

Due to the need to re-hyphenate combined words after splitting them up, this formula takes considerably longer to compute than most others. If will be omitted if you set index="fast" instead of the default.

Wrapper function: FOG

"FORCAST":

FORCAST = 20 - \frac{W^{1Sy} \times \frac{150}{W}}{10}

If parameters is set to FORCAST="RGL", the parameters for the precise reading grade level are used (see Klare, 1975, pp. 84–85):

FORCAST_{RGL} = 20.43 - 0.11 \times W^{1Sy} \times \frac{150}{W}

Wrapper function: FORCAST

"Fucks":

Fucks' Stilcharakteristik:

Fucks = \frac{C}{W} \times \frac{W}{St}

This simple formula has no parameters.

Wrapper function: fucks

"Harris.Jacobson":

HJ_1 = 0.094 \times \frac{100 \times{} W_{-WL}}{W} + 0.168 \times \frac{W}{St} + 0.502

HJ_2 = 0.140 \times \frac{100 \times{} W_{-WL}}{W} + 0.153 \times \frac{W}{St} + 0.560

HJ_3 = 0.158 \times \frac{W}{St} + 0.055 \times \frac{100 \times{} W_{6C}}{W} + 0.355

HJ_4 = 0.070 \times \frac{100 \times{} W_{-WL}}{W} + 0.125 \times \frac{W}{St} + 0.037 \times \frac{100 \times{} W_{6C}}{W} + 0.497

HJ_5 = 0.118 \times \frac{100 \times{} W_{-WL}}{W} + 0.134 \times \frac{W}{St} + 0.032 \times \frac{100 \times{} W_{6C}}{W} + 0.424

Note: This index needs the short Harris-Jacobson word list for grades 1 and 2 (english) to compute W_{-WL}. That is, you must have a copy of this word list and provide it via the word.lists=list(Harris.Jacobson=<your.list>) parameter!

Wrapper function: harris.jacobson

"Linsear.Write" (O'Hayre, undated, see Klare, 1975, p. 85):

LW_{raw} = \frac{100 - \frac{100 \times W_{<3Sy}}{W} + ≤ft( 3 \times \frac{100 \times W_{3Sy}}{W} \right)}{\frac{100 \times St}{W}}

LW(LW_{raw} ≤q 20) = \frac{LW_{raw} - 2}{2}

LW(LW_{raw} > 20) = \frac{LW_{raw}}{2}

Wrapper function: linsear.write

"LIX"

Bj<c3><b6>rnsson's L<c3><a4>sbarhetsindex. Originally proposed for Swedish texts, calculated by:

LIX = W7C / St + (L*100) / W

Texts with a LIX < 25 are considered very easy, around 40 normal, and > 55 very difficult to read.

Wrapper function: LIX

"nWS":

Neue Wiener Sachtextformeln (Bamberger & Vanecek, 1984):

nWS_1 = 19.35 \times \frac{W_{3Sy}}{W} + 0.1672 \times \frac{W}{St} + 12.97 \times \frac{W_{6C}}{W} - 3.27 \times \frac{W^{1Sy}}{W} - 0.875

nWS_2 = 20.07 \times \frac{W_{3Sy}}{W} + 0.1682 \times \frac{W}{St} + 13.73 \times \frac{W_{6C}}{W} - 2.779

nWS_3 = 29.63 \times \frac{W_{3Sy}}{W} + 0.1905 \times \frac{W}{St} - 1.1144

nWS_4 = 27.44 \times \frac{W_{3Sy}}{W} + 0.2656 \times \frac{W}{St} - 1.693

Wrapper function: nWS

"RIX"

Anderson's Readability Index. A simplified version of LIX:

RIX = W7C / St

Texts with a RIX < 1.8 are considered very easy, around 3.7 normal, and > 7.2 very difficult to read.

Wrapper function: RIX

"SMOG":

Simple Measure of Gobbledygook. By default calculates formula D by McLaughlin (1969):

SMOG = 1.043 \times √{W_{3Sy} \times \frac{30}{St}} + 3.1291

If parameters is set to SMOG="C", formula C will be calculated:

SMOG_{C} = 0.9986 \times √{W_{3Sy} \times \frac{30}{St} + 5} + 2.8795

If parameters is set to SMOG="simple", the simplified formula is used:

SMOG_{simple} = √{W_{3Sy} \times \frac{30}{St}} + 3

If parameters is set to SMOG="de", the formula adapted to German texts ("Qu", Bamberger & Vanecek, 1984, p. 78) is used:

SMOG_{de} = √{W_{3Sy} \times \frac{30}{St}} - 2

Wrapper function: SMOG

"Spache":

Spache Revised Formula (1974):

Spache = 0.121 \times \frac{W}{St} + 0.082 \times{} \frac{100 \times{} W_{-WL}}{W} + 0.659

If parameters is set to Spache="old", the original parameters (Spache, 1953) are used:

Spache_{old} = 0.141 \times \frac{W}{St} + 0.086 \times{} \frac{100 \times{} W_{-WL}}{W} + 0.839

Note: The revised index needs the revised Spache word list (see Klare, 1975, p. 73), and the old index the short Dale-Chall list of 769 familiar (english) words to compute W_{-WL}. That is, you must have a copy of this word list and provide it via the word.lists=list(Spache=<your.list>) parameter!

Wrapper function: spache

"Strain":

Strain Index. This index was proposed in [1]:

Sy \times{} \frac{1}{St / 3} \times{} \frac{1}{10}

Wrapper function: strain

"Traenkle.Bailer":

Tr<c3><a4>nkle-Bailer Formeln. These two formulas were the result of a re-examination of the ones proposed by Dickes-Steiwer. They try to avoid the usage of the type-token ratio, which is dependent on text length (Tr<c3><a4>nkle & Bailer, 1984):

TB1 = 224.6814 - ≤ft(79.8304 \times \frac{C}{W} \right) - ≤ft(12.24032 \times \frac{W}{St} \right) - ≤ft(1.292857 \times \frac{100 \times{} W_{prep}}{W} \right)

TB2 = 234.1063 - ≤ft(96.11069 \times \frac{C}{W} \right) - ≤ft(2.05444 \times \frac{100 \times{} W_{prep}}{W} \right) - ≤ft(1.02805 \times \frac{100 \times{} W_{conj}}{W} \right)

Where W_{prep} refers to the number of prepositions, and W_{conj} to the number of conjunctions.

Wrapper function: traenkle.bailer

"TRI":

Kuntzsch's Text-Redundanz-Index. Intended mainly for German newspaper comments.

TRI = ≤ft(0.449 \times W^{1Sy}\right) - ≤ft(2.467 \times Ptn\right) - ≤ft(0.937 \times Frg\right) - 14.417

Where Ptn is the number of punctuation marks and Frg the number of foreign words.

Wrapper function: TRI

"Tuldava":

Tuldava's Text Difficulty Formula. Supposed to be rather independent of specific languages (Grzybek, 2010).

TD = \frac{Sy}{W} \times ln≤ft( \frac{W}{St} \right)

Wrapper function: tuldava

"Wheeler.Smith":

Intended for english texts in primary grades 1–4 (Wheeler & Smith, 1954):

WS = \frac{W}{St} \times \frac{10 \times{} W_{2Sy}}{W}

If parameters is set to Wheeler.Smith="de", the calculation stays the same, but grade placement is done according to Bamberger & Vanecek (1984), that is for german texts.

Wrapper function: wheeler.smith

By default, if the text has to be tagged yet, the language definition is queried by calling get.kRp.env(lang=TRUE) internally. Or, if txt has already been tagged, by default the language definition of that tagged object is read and used. Set force.lang=get.kRp.env(lang=TRUE) or to any other valid value, if you want to forcibly overwrite this default behaviour, and only then. See kRp.POS.tags for all supported languages.

## Value

An object of class kRp.readability-class.

## Note

To get a printout of the default parameters like they're set if no other parameters are specified, call readability(parameters="dput"). In case you want to provide different parameters, you must provide a complete set for an index, or special parameters that are mentioned in the index descriptions above (e.g., "PSK", if appropriate).

## References

Anderson, J. (1981). Analysing the readability of english and non-english texts in the classroom with Lix. In Annual Meeting of the Australian Reading Association, Darwin, Australia.

Anderson, J. (1983). Lix and Rix: Variations on a little-known readability index. Journal of Reading, 26(6), 490–496.

Bamberger, R. & Vanecek, E. (1984). Lesen–Verstehen–Lernen–Schreiben. Wien: Jugend und Volk.

Coleman, M. & Liau, T.L. (1975). A computer readability formula designed for machine scoring, Journal of Applied Psychology, 60(2), 283–284.

Dickes, P. & Steiwer, L. (1977). Ausarbeitung von Lesbarkeitsformeln f<c3><bc>r die deutsche Sprache. Zeitschrift f<c3><bc>r Entwicklungspsychologie und P<c3><a4>dagogische Psychologie, 9(1), 20–28.

DuBay, W.H. (2004). The Principles of Readability. Costa Mesa: Impact Information. WWW: http://www.impact-information.com/impactinfo/readability02.pdf; 22.03.2011.

Farr, J.N., Jenkins, J.J. & Paterson, D.G. (1951). Simplification of Flesch Reading Ease formula. Journal of Applied Psychology, 35(5), 333–337.

Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 221–233.

Fucks, W. (1955). Der Unterschied des Prosastils von Dichtern und anderen Schriftstellern. Sprachforum, 1, 233–244.

Grzybek, P. (2010). Text difficulty and the Arens-Altmann law. In Peter Grzybek, Emmerich Kelih, J<c3><a1>n Ma<c4><8d>utek (Eds.), Text and Language. Structures – Functions – Interrelations. Quantitative Perspectives. Wien: Praesens, 57–70.

Harris, A.J. & Jacobson, M.D. (1974). Revised Harris-Jacobson readability formulas. In 18th Annual Meeting of the College Reading Association, Bethesda.

Powers, R.D, Sumner, W.A, & Kearl, B.E. (1958). A recalculation of four adult readability formulas, Journal of Educational Psychology, 49(2), 99–105.

Smith, E.A. & Senter, R.J. (1967). Automated readability index. AMRL-TR-66-22. Wright-Paterson AFB, Ohio: Aerospace Medical Division.

Spache, G. (1953). A new readability formula for primary-grade reading materials. The Elementary School Journal, 53, 410–413.

Tr<c3><a4>nkle, U. & Bailer, H. (1984). Kreuzvalidierung und Neuberechnung von Lesbarkeitsformeln f<c3><bc>r die deutsche Sprache. Zeitschrift f<c3><bc>r Entwicklungspsychologie und P<c3><a4>dagogische Psychologie, 16(3), 231–244.

Wheeler, L.R. & Smith, E.H. (1954). A practical readability formula for the classroom teacher in the primary grades. Elementary English, 31, 397–399.

## Examples

 1 2 3 4 ## Not run: readability(tagged.text) ## End(Not run) 

koRpus documentation built on May 30, 2017, 12:47 a.m.