Description Usage Arguments Details Value Note References Examples
These methods calculate several readability indices.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | readability(txt.file, ...)
## S4 method for signature 'kRp.text'
readability(
txt.file,
hyphen = NULL,
index = c("ARI", "Bormuth", "Coleman", "Coleman.Liau", "Dale.Chall",
"Danielson.Bryan", "Dickes.Steiwer", "DRP", "ELF", "Farr.Jenkins.Paterson", "Flesch",
"Flesch.Kincaid", "FOG", "FORCAST", "Fucks", "Gutierrez", "Harris.Jacobson",
"Linsear.Write", "LIX", "nWS", "RIX", "SMOG", "Spache", "Strain", "Traenkle.Bailer",
"TRI", "Tuldava", "Wheeler.Smith"),
parameters = list(),
word.lists = list(Bormuth = NULL, Dale.Chall = NULL, Harris.Jacobson = NULL, Spache =
NULL),
fileEncoding = "UTF-8",
sentc.tag = "sentc",
nonword.class = "nonpunct",
nonword.tag = c(),
quiet = FALSE,
keep.input = NULL,
as.feature = FALSE
)
## S4 method for signature 'missing'
readability(txt.file, index)
## S4 method for signature 'kRp.readability,ANY,ANY,ANY'
x[i]
## S4 method for signature 'kRp.readability'
x[[i]]
|
txt.file |
An object of class |
... |
Additional arguments for the generics. |
hyphen |
An object of class |
index |
A character vector,
indicating which indices should actually be computed. If set to |
parameters |
A list with named magic numbers, defining the relevant parameters for each index. If none are given, the default values are used. |
word.lists |
A named list providing the word lists for indices which need one. If |
fileEncoding |
A character string defining the character encoding of the |
sentc.tag |
A character vector with POS tags which indicate a sentence ending. The default value |
nonword.class |
A character vector with word classes which should be ignored for readability analysis. The default value
|
nonword.tag |
A character vector with POS tags which should be ignored for readability analysis. Will only be
of consequence if |
quiet |
Logical. If |
keep.input |
Logical. If |
as.feature |
Logical,
whether the output should be just the analysis results or the input object with
the results added as a feature. Use |
x |
An object of class |
i |
Defines the row selector ( |
In the following formulae, W stands for the number of words, St for the number of sentences, C for the number of characters (usually meaning letters), Sy for the number of syllables, W_{3Sy} for the number of words with at least three syllables, W_{<3Sy} for the number of words with less than three syllables, W^{1Sy} for words with exactly one syllable, W_{6C} for the number of words with at least six letters, and W_{-WL} for the number of words which are not on a certain word list (explained where needed).
"ARI"
:Automated Readability Index:
ARI = 0.5 * W / St + 4.71 * C / W - 21.43
If parameters
is set to ARI="NRI"
,
the revised parameters from the Navy Readability Indexes are used:
ARI_NRI = 0.4 * W / St + 6 * C / W - 27.4
If parameters
is set to ARI="simple"
,
the simplified formula is calculated:
ARI_simple = W / St + 9 * C / W
Wrapper function: ARI
"Bormuth"
:Bormuth Mean Cloze & Grade Placement:
B_MC = 0.886593 - (0.08364 * C / W) + 0.161911 * (W_-WL / W)^3
- 0.21401 * (W / St) + 0.000577 * (W / St)^2
- 0.000005 * (W / St)^3
Note: This index needs the long Dale-Chall list of 3000 familiar (english) words to compute W_-WL. That is,
you must have a copy of
this word list and provide it via the word.lists=list(Bormuth=<your.list>)
parameter!
B_GP = 4.275 + 12.881 * B_MC - (34.934 * B_MC^2) + (20.388 * B_MC^3)
+ (26.194C - 2.046 C_CS^2) - (11.767 C_CS^3) - (44.285 * B_MC * C_CS)
+ (97.620 * (B_MC * C_CS)^2) - (59.538 * (B_MC * C_CS)^3)
Where C_CS represents the cloze criterion score (35% by default).
Wrapper function: bormuth
"Coleman"
:Coleman's Readability Formulas:
C_1 = 1.29 * (100 * W^1Sy / W) - 38.45
C_2 = 1.16 * (100 * W^1Sy / W) + 1.48 * (100 * St / W) - 37.95
C_3 = 1.07 * (100 * W^1Sy / W) + 1.18 * (100 * St / W) + 0.76 * (100 * W_pron / W) - 34.02
C_4 = 1.04 * (100 * W^1Sy / W) + 1.06 * (100 * St / W) + 0.56 * (100 * W_pron / W) - 0.36 * (100 * W_prep / W) - 26.01
Where W_pron is the number of pronouns, and W_prep the number of prepositions.
Wrapper function: coleman
"Coleman.Liau"
:First estimates cloze percentage, then calculates grade equivalent:
CL_ECP = 141.8401 - 0.214590 * 100 * C / W + 1.079812 * 100 * St / W
CL_grade = -27.4004 * CL_ECP / 100 + 23.06395
The short form is also calculated:
CL_short = 5.88 * C / W - 29.6 * St / W - 15.8
Wrapper function: coleman.liau
"Dale.Chall"
:New Dale-Chall Readability Formula. By default the revised formula (1995) is calculated:
DC_new = 64 - 0.95 * 100 * W_-WL / W - 0.69 * W / St
This will result in a cloze score which is then looked up in a grading table. If parameters
is set to Dale.Chall="old"
,
the original formula (1948) is used:
DC_old = 0.1579 * 100 * W_-WL / W + 0.0496 * W / St + 3.6365
If parameters
is set to Dale.Chall="PSK"
,
the revised parameters by Powers-Sumner-Kearl (1958) are used:
DC_PSK = 0.1155 * 100 * W_-WL / W + 0.0596 * W / St + 3.2672
Note: This index needs the long Dale-Chall list of 3000 familiar (english) words to compute W_-WL. That is,
you must have a copy of
this word list and provide it via the word.lists=list(Dale.Chall=<your.list>)
parameter!
Wrapper function: dale.chall
"Danielson.Bryan"
:DB_1 = ( 1.0364 * C / Bl) + ( 0.0194 * C / St ) - 0.6059
DB_2 = 131.059 - ( 10.364 * C / Bl ) - ( 0.194 * C / St )
Where Bl means blanks between words, which is not really counted in this implementation, but estimated by words - 1. C is interpreted as literally all characters.
Wrapper function: danielson.bryan
"Dickes.Steiwer"
:Dickes-Steiwer Handformel:
DS = 235.95993 - (73.021 * C / W) - (12.56438 * W / St) - (50.03293 * TTR)
Where TTR refers to the type-token ratio, which will be calculated case-insensitive by default.
Wrapper function: dickes.steiwer
"DRP"
:Degrees of Reading Power. Uses the Bormuth Mean Cloze Score:
DRP = (1 - B_MC) * 100
This formula itself has no parameters.
Note: The Bormuth index needs the long Dale-Chall list of 3000 familiar (english) words to compute W_-WL.
That is,
you must have a copy of this word list and provide it via the word.lists=list(Bormuth=<your.list>)
parameter!
Wrapper function: DRP
"ELF"
:Fang's Easy Listening Formula:
ELF = W_2Sy / St
Wrapper function: ELF
"Farr.Jenkins.Paterson"
:A simplified version of Flesch Reading Ease:
FJP = -31.517 - 1.015 * W / St + 1.599 * W^1Sy / W
If parameters
is set to Farr.Jenkins.Paterson="PSK"
,
the revised parameters by Powers-Sumner-Kearl (1958) are used:
FJP_PSK = 8.4335 + 0.0923 * W / St - 0.0648 * W^1Sy / W
Wrapper function: farr.jenkins.paterson
"Flesch"
:Flesch Reading Ease:
F_EN = 206.835 - 1.015 * W / St - 84.6 * Sy / W
Certain internationalisations of the parameters are also implemented. They can be used by setting
the Flesch
parameter to one of the following language abbreviations.
"de"
(Amstad's Verständlichkeitsindex):
F_DE = 180 - W / St - 58.5 * Sy / W
"es"
(Fernandez-Huerta):
F_ES = 206.835 - 1.02 * W / St - 60 * Sy / W
"es-s"
(Szigriszt):
F_ES S = 206.835 - W / St - 62.3 * Sy / W
"nl"
(Douma):
F_NL = 206.835 - 0.93 * W / St - 77 * Sy / W
"nl-b"
(Brouwer Leesindex):
F_NL B = 195 - 2 * W / St - 67 * Sy / W
"fr"
(Kandel-Moles):
F_FR = 209 - 1.15 * W / St - 68 * Sy / W
If parameters
is set to Flesch="PSK"
,
the revised parameters by Powers-Sumner-Kearl (1958) are used
to calculate a grade level:
F_PSK = 0.0778 * W / St + 4.55 * Sy / W - 2.2029
Wrapper function: flesch
"Flesch.Kincaid"
:Flesch-Kincaid Grade Level:
FK = 0.39 * W / St + 11.8 * Sy / W - 15.59
Wrapper function: flesch.kincaid
"FOG"
:Gunning Frequency of Gobbledygook:
FOG = 0.4 * ( W / St + 100 * W_3Sy / W )
If parameters
is set to FOG="PSK"
,
the revised parameters by Powers-Sumner-Kearl (1958) are used:
FOG_PSK = 3.0680 + ( 0.0877 * W / St ) + ( 0.0984 * 100 * W_3Sy / W )
If parameters
is set to FOG="NRI"
,
the new FOG count from the Navy Readability Indexes is used:
FOG_new = ( W_<3Sy + ( 3 * W_3Sy) / ( 100 * St / W ) - 3 ) / 2
If the text was POS-tagged accordingly, proper nouns and combinations of only easy words will not be counted as hard words, and the syllables of verbs ending in "-ed", "-es" or "-ing" will be counted without these suffixes.
Due to the need to re-hyphenate combined words after splitting them up,
this formula takes considerably longer to compute than most others.
If will be omitted if you set index="fast"
instead of the default.
Wrapper function: FOG
"FORCAST"
:FORCAST = 20 - ( W^1Sy * 150 / W ) / 10
If parameters
is set to FORCAST="RGL"
,
the parameters for the precise reading grade level are used (see Klare, 1975, pp. 84–85):
FORCAST_RGL = 20.43 - 0.11 * W^1Sy * 150 / W
Wrapper function: FORCAST
"Fucks"
:Fucks' Stilcharakteristik (Fucks, 1955, as cited in Briest, 1974):
Fucks = ( Sy / W ) * ( W / St )
This simple formula has no parameters.
Wrapper function: fucks
"Gutierrez"
:Gutiérrez de Polini's Fórmula de comprensibilidad (Gutiérrez, 1972, as cited in Fernández, 2016) for Spanish:
Gutierrez = 95.2 - 9.7 * C / W - 0.35 * W / St
Wrapper function: gutierrez
"Harris.Jacobson"
:Revised Harris-Jacobson Readability Formulas (Harris & Jacobson, 1974): For primary-grade material:
HJ_1 = 0.094 * 100 * W_-WL / W + 0.168 * W / St + 0.502
For material above third grade:
HJ_2 = 0.140 * 100 * W_-WL / W + 0.153 * W / St + 0.560
For material below forth grade:
HJ_3 = 0.158 * W / St + 0.055 * 100 * W_6C / W + 0.355
For material below forth grade:
HJ_4 = 0.070 * 100 * W_-WL / W + 0.125 * W / St + 0.037 * 100 * W_6C / W + 0.497
For material above third grade:
HJ_5 = 0.118 * 100 * W_-WL / W + 0.134 * W / St + 0.032 * 100 * W_6C / W + 0.424
Note: This index needs the short Harris-Jacobson word list for grades 1 and 2 (english) to compute W_{-WL}. That is,
you must have a copy of
this word list and provide it via the word.lists=list(Harris.Jacobson=<your.list>)
parameter!
Wrapper function: harris.jacobson
"Linsear.Write"
(O'Hayre, undated, see Klare, 1975, p. 85):LW_raw = ( 100 - 100 * W_<3Sy / W + ( 3 * 100 * W_3Sy / W ) ) / ( 100 * St / W )
LW(LW_raw <= 20) = LW_raw - 2 / 2
LW(LW_raw > 20) = LW_raw / 2
Wrapper function: linsear.write
"LIX"
Björnsson's Läsbarhetsindex. Originally proposed for Swedish texts, calculated by:
LIX = W / St + (W7C * 100) / W
Texts with a LIX < 25 are considered very easy, around 40 normal, and > 55 very difficult to read.
Wrapper function: LIX
"nWS"
:Neue Wiener Sachtextformeln (Bamberger & Vanecek, 1984):
nWS_1 = 19.35 * W_3Sy / W + 0.1672 * W / St + 12.97 * W_6C / W - 3.27 * W^1Sy / W - 0.875
nWS_2 = 20.07 * W_3Sy / W + 0.1682 * W / St + 13.73 * W_6C / W - 2.779
nWS_3 = 29.63 * W_3Sy / W + 0.1905 * W / St - 1.1144
nWS_4 = 27.44 * W_3Sy / W + 0.2656 * W / St - 1.693
Wrapper function: nWS
"RIX"
Anderson's Readability Index. A simplified version of LIX:
RIX = W7C / St
Texts with a RIX < 1.8 are considered very easy, around 3.7 normal, and > 7.2 very difficult to read.
Wrapper function: RIX
"SMOG"
:Simple Measure of Gobbledygook. By default calculates formula D by McLaughlin (1969):
SMOG = 1.043 * √{W_3Sy * 30 / St} + 3.1291
If parameters
is set to SMOG="C"
, formula C will be calculated:
SMOG_C = 0.9986 * √{W_3Sy * 30 / St + 5} + 2.8795
If parameters
is set to SMOG="simple"
, the simplified formula is used:
SMOG_simple = √{W_3Sy * 30 / St} + 3
If parameters
is set to SMOG="de"
,
the formula adapted to German texts ("Qu", Bamberger & Vanecek, 1984, p. 78) is used:
SMOG_de = √{W_3Sy * 30 / St} - 2
Wrapper function: SMOG
"Spache"
:Spache Revised Formula (1974):
Spache = 0.121 * W / St + 0.082 * 100 * W_-WL / W + 0.659
If parameters
is set to Spache="old"
, the original parameters (Spache,
1953) are used:
Spache_old = 0.141 * W / St + 0.086 * 100 * W_-WL / W + 0.839
Note: The revised index needs the revised Spache word list (see Klare, 1975,
p. 73), and the old index the short Dale-Chall list of
769 familiar (english) words to compute W_{-WL}. That is,
you must have a copy of this word list and provide it via the
word.lists=list(Spache=<your.list>)
parameter!
Wrapper function: spache
"Strain"
:Strain Index. This index was proposed in [1]:
S = Sy * 1 / ( St / 3 ) * 1 / 10
Wrapper function: strain
"Traenkle.Bailer"
:Tränkle-Bailer Formeln. These two formulas were the result of a re-examination of the ones proposed by Dickes-Steiwer. They try to avoid the usage of the type-token ratio, which is dependent on text length (Tränkle & Bailer, 1984):
TB1 = 224.6814 - ( 79.8304 * C / W ) - (12.24032 * W / St ) - (1.292857 * 100 * W_prep / W )
TB2 = 234.1063 - ( 96.11069 * C / W ) - ( 2.05444 * 100 * W_prep / W ) - (1.02805 * 100 * W_conj / W )
Where W_{prep} refers to the number of prepositions, and W_{conj} to the number of conjunctions.
Wrapper function: traenkle.bailer
"TRI"
:Kuntzsch's Text-Redundanz-Index. Intended mainly for German newspaper comments.
TRI = ( 0.449 * W^1Sy ) - ( 2.467 * Ptn ) - ( 0.937 * Frg ) - 14.417
Where Ptn is the number of punctuation marks and Frg the number of foreign words.
Wrapper function: TRI
"Tuldava"
:Tuldava's Text Difficulty Formula. Supposed to be rather independent of specific languages (Grzybek, 2010).
TD = Sy / W * ln( W / St )
Wrapper function: tuldava
"Wheeler.Smith"
:Intended for english texts in primary grades 1–4 (Wheeler & Smith, 1954):
WS = W / St * 10 * W_2Sy / W
If parameters
is set to Wheeler.Smith="de"
,
the calculation stays the same, but grade placement
is done according to Bamberger & Vanecek (1984), that is for german texts.
Wrapper function: wheeler.smith
By default, if the text has to be tagged yet,
the language definition is queried by calling get.kRp.env(lang=TRUE)
internally.
Or, if txt
has already been tagged,
by default the language definition of that tagged object is read
and used. Set force.lang=get.kRp.env(lang=TRUE)
or to any other valid value,
if you want to forcibly overwrite this
default behaviour,
and only then. See kRp.POS.tags
for all supported languages.
Depending on as.feature
,
either an object of class kRp.readability
,
or an object of class kRp.text
with the added feature readability
containing it.
To get a printout of the default parameters like they're set if no other parameters are specified,
call readability(parameters="dput")
.
In case you want to provide different parameters,
you must provide a complete set for an index, or special parameters that are
mentioned in the index descriptions above (e.g., "PSK", if appropriate).
Anderson, J. (1981). Analysing the readability of english and non-english texts in the classroom with Lix. In Annual Meeting of the Australian Reading Association, Darwin, Australia.
Anderson, J. (1983). Lix and Rix: Variations on a little-known readability index. Journal of Reading, 26(6), 490–496.
Bamberger, R. & Vanecek, E. (1984). Lesen–Verstehen–Lernen–Schreiben. Wien: Jugend und Volk.
Briest, W. (1974). Kann man Verständlichkeit messen? Zeitschrift für Phonetik, Sprachwissenschaft und Kommunikationsforschung, 27, 543–563.
Coleman, M. & Liau, T.L. (1975). A computer readability formula designed for machine scoring, Journal of Applied Psychology, 60(2), 283–284.
Dickes, P. & Steiwer, L. (1977). Ausarbeitung von Lesbarkeitsformeln für die deutsche Sprache. Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, 9(1), 20–28.
DuBay, W.H. (2004). The Principles of Readability. Costa Mesa: Impact Information. WWW: http://www.impact-information.com/impactinfo/readability02.pdf; 22.03.2011.
Farr, J.N., Jenkins, J.J. & Paterson, D.G. (1951). Simplification of Flesch Reading Ease formula. Journal of Applied Psychology, 35(5), 333–337.
Fernández, A. M. (2016, November 30). Fórmula de comprensibilidad de Gutiérrez de Polini. https://legible.es/blog/comprensibilidad-gutierrez-de-polini/
Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32(3), 221–233.
Grzybek, P. (2010). Text difficulty and the Arens-Altmann law. In Peter Grzybek, Emmerich Kelih, Ján Mačutek (Eds.), Text and Language. Structures – Functions – Interrelations. Quantitative Perspectives. Wien: Praesens, 57–70.
Harris, A.J. & Jacobson, M.D. (1974). Revised Harris-Jacobson readability formulas. In 18th Annual Meeting of the College Reading Association, Bethesda.
Klare, G.R. (1975). Assessing readability. Reading Research Quarterly, 10(1), 62–102.
McLaughlin, G.H. (1969). SMOG grading – A new readability formula. Journal of Reading, 12(8), 639–646.
Powers, R.D, Sumner, W.A, & Kearl, B.E. (1958). A recalculation of four adult readability formulas, Journal of Educational Psychology, 49(2), 99–105.
Smith, E.A. & Senter, R.J. (1967). Automated readability index. AMRL-TR-66-22. Wright-Paterson AFB, Ohio: Aerospace Medical Division.
Spache, G. (1953). A new readability formula for primary-grade reading materials. The Elementary School Journal, 53, 410–413.
Tränkle, U. & Bailer, H. (1984). Kreuzvalidierung und Neuberechnung von Lesbarkeitsformeln für die deutsche Sprache. Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, 16(3), 231–244.
Wheeler, L.R. & Smith, E.H. (1954). A practical readability formula for the classroom teacher in the primary grades. Elementary English, 31, 397–399.
[1] https://strainindex.wordpress.com/2007/09/25/hello-world/
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | # code is only run when the english language package can be loaded
if(require("koRpus.lang.en", quietly = TRUE)){
sample_file <- file.path(
path.package("koRpus"), "examples", "corpus", "Reality_Winner.txt"
)
# call readability() on a tokenized text
tokenized.obj <- tokenize(
txt=sample_file,
lang="en"
)
# if you call readability() without arguments,
# you will get its results directly
rdb.results <- readability(tokenized.obj)
# there are [ and [[ methods for these objects
rdb.results[["ARI"]]
# alternatively, you can also store those results as a
# feature in the object itself
tokenized.obj <- readability(
tokenized.obj,
as.feature=TRUE
)
# results are now part of the object
hasFeature(tokenized.obj)
corpusReadability(tokenized.obj)
} else {}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.