View source: R/textstat_readability.R
| textstat_readability | R Documentation | 
Calculate the readability of text(s) using one of a variety of computed indexes.
textstat_readability(
  x,
  measure = "Flesch",
  remove_hyphens = TRUE,
  min_sentence_length = 1,
  max_sentence_length = 10000,
  intermediate = FALSE,
  ...
)
| x | a character or corpus object containing the texts | 
| measure | character vector defining the readability measure to calculate. Matches are case-insensitive. See other valid measures under Details. | 
| remove_hyphens | if  | 
| min_sentence_length,max_sentence_length | set the minimum and maximum sentence lengths (in tokens, excluding punctuation) to include in the computation of readability. This makes it easy to exclude "sentences" that may not really be sentences, such as section titles, table elements, and other cruft that might be in the texts following conversion. For finer-grained control, consider filtering sentences prior first, including through pattern-matching, using corpus_trim(). | 
| intermediate | if  | 
| ... | not used | 
The following readability formulas have been implemented, where
 Nw = n_{w} = number of words
 Nc = n_{c} = number of characters
 Nst = n_{st} = number of sentences
 Nsy = n_{sy} = number of syllables
 Nwf = n_{wf} = number of words matching the Dale-Chall List
of 3000 "familiar words"
ASL = Average Sentence Length: number of words / number of sentences
AWL = Average Word Length: number of characters / number of words
AFW = Average Familiar Words: count of words matching the Dale-Chall list of 3000 "familiar words" / number of all words
 Nwd = n_{wd} = number of "difficult" words not matching the
Dale-Chall list of "familiar" words
"ARI":Automated Readability Index (Senter and Smith 1967)
0.5 ASL  + 4.71 AWL - 21.34
"ARI.Simple":A simplified version of Senter and Smith's (1967) Automated Readability Index.
ASL + 9 AWL
"Bormuth.MC":Bormuth's (1969) Mean Cloze Formula.
0.886593 - 0.03640 \times AWL + 0.161911 \times AFW  - 0.21401 \times
  ASL - 0.000577 \times ASL^2 - 0.000005 \times ASL^3
"Bormuth.GP":Bormuth's (1969) Grade Placement score.
4.275 + 12.881M - 34.934M^2 + 20.388 M^3 + 26.194 CCS -
  2.046 CCS^2 - 11.767 CCS^3 - 42.285(M \times CCS) + 97.620(M \times CCS)^2 -
  59.538(M \times CCS)^2
where M is the Bormuth Mean Cloze Formula as in
"Bormuth" above, and CCS is the Cloze Criterion Score (Bormuth,
1968).
"Coleman":Coleman's (1971) Readability Formula 1.
1.29 \times \frac{100 \times n_{wsy=1}}{n_{w}} - 38.45
where n_{wsy=1} = Nwsy1 = the number of one-syllable words.  The
scaling by 100 in this and the other Coleman-derived measures arises
because the Coleman measures are calculated on a per 100 words basis.
"Coleman.C2":Coleman's (1971) Readability Formula 2.
1.16 \times \frac{100 \times n_{wsy=1}}{
  Nw + 1.48 \times \frac{100 \times n_{st}}{n_{w}} - 37.95}
"Coleman.Liau.ECP":Coleman-Liau Estimated Cloze Percent (ECP) (Coleman and Liau 1975).
141.8401 - 0.214590 \times 100
  \times AWL + 1.079812 \times \frac{n_{st} \times 100}{n_{w}}
"Coleman.Liau.grade":Coleman-Liau Grade Level (Coleman and Liau 1975).
-27.4004 \times \mathtt{Coleman.Liau.ECP} \times 100 +
  23.06395
"Coleman.Liau.short":Coleman-Liau Index (Coleman and Liau 1975).
5.88 \times AWL + 29.6 \times \frac{n_{st}}{n_{w}} - 15.8
"Dale.Chall":The New Dale-Chall Readability formula (Chall and Dale 1995).
64 - (0.95 \times 100 \times \frac{n_{wd}}{n_{w}}) - (0.69 \times ASL)
"Dale.Chall.old":The original Dale-Chall Readability formula (Dale and Chall (1948).
0.1579 \times 100 \times \frac{n_{wd}}{n_{w}} + 0.0496 \times ASL [+ 3.6365]
The additional constant 3.6365 is only added if (Nwd / Nw) > 0.05.
"Dale.Chall.PSK":The Powers-Sumner-Kearl Variation of the Dale and Chall Readability formula (Powers, Sumner and Kearl, 1958).
0.1155 \times
  100 \frac{n_{wd}}{n_{w}}) + (0.0596 \times ASL) + 3.2672 
"Danielson.Bryan":Danielson-Bryan's (1963) Readability Measure 1.
  (1.0364 \times \frac{n_{c}}{n_{blank}}) +
  (0.0194 \times \frac{n_{c}}{n_{st}}) -
  0.6059
where n_{blank} = Nblank = the number of blanks.
"Danielson.Bryan2":Danielson-Bryan's (1963) Readability Measure 2.
  131.059- (10.364 \times \frac{n_{c}}{n_{blank}}) + (0.0194
   \times \frac{n_{c}}{n_{st}})
where n_{blank} = Nblank = the number of blanks.
"Dickes.Steiwer":Dickes-Steiwer Index (Dicks and Steiwer 1977).
  235.95993 - (7.3021 \times AWL)  - (12.56438 \times ASL) -
  (50.03293 \times TTR)
where TTR is the Type-Token Ratio (see textstat_lexdiv())
"DRP":Degrees of Reading Power.
(1 - Bormuth.MC) *
  100
where Bormuth.MC refers to Bormuth's (1969) Mean Cloze Formula (documented above)
"ELF":Easy Listening Formula (Fang 1966):
\frac{n_{wsy>=2}}{n_{st}}
where n_{wsy>=2} = Nwmin2sy = the number of words with 2 syllables or more.
"Farr.Jenkins.Paterson":Farr-Jenkins-Paterson's Simplification of Flesch's Reading Ease Score (Farr, Jenkins and Paterson 1951).
   -31.517 - (1.015 \times ASL) + (1.599 \times
  \frac{n_{wsy=1}}{n_{w}})
where n_{wsy=1} = Nwsy1 = the number of one-syllable words.
"Flesch":Flesch's Reading Ease Score (Flesch 1948).
206.835 - (1.015 \times ASL) - (84.6 \times \frac{n_{sy}}{n_{w}})
"Flesch.PSK":The Powers-Sumner-Kearl's Variation of Flesch Reading Ease Score (Powers, Sumner and Kearl, 1958).
 (0.0778 \times
  ASL) + (4.55 \times \frac{n_{sy}}{n_{w}}) -
  2.2029
"Flesch.Kincaid":Flesch-Kincaid Readability Score (Flesch and Kincaid 1975).
  0.39 \times ASL + 11.8  \times \frac{n_{sy}}{n_{w}} -
  15.59
"FOG":Gunning's Fog Index (Gunning 1952).
0.4
  \times (ASL + 100 \times \frac{n_{wsy>=3}}{n_{w}})
where n_{wsy>=3} = Nwmin3sy = the number of words with 3-syllables or more.
The scaling by 100 arises because the original FOG index is based on
just a sample of 100 words)
"FOG.PSK":The Powers-Sumner-Kearl Variation of Gunning's Fog Index (Powers, Sumner and Kearl, 1958).
3.0680 \times
  (0.0877 \times ASL) +(0.0984 \times 100 \times \frac{n_{wsy>=3}}{n_{w}})
where n_{wsy>=3} = Nwmin3sy = the number of words with 3-syllables or more.
The scaling by 100 arises because the original FOG index is based on
just a sample of 100 words)
"FOG.NRI":The Navy's Adaptation of Gunning's Fog Index (Kincaid, Fishburne, Rogers and Chissom 1975).
(\frac{(n_{wsy<3} + 3 \times n_{wsy=3})}{(100 \times \frac{N_{st}}{N_{w}})}  -
  3) / 2 
where n_{wsy<3} = Nwless3sy = the number of words with less than 3 syllables, and
n_{wsy=3} = Nw3sy = the number of 3-syllable words. The scaling by 100
arises because the original FOG index is based on just a sample of 100 words)
"FORCAST":FORCAST (Simplified Version of FORCAST.RGL) (Caylor and Sticht 1973).
 20 - \frac{n_{wsy=1} \times
  150)}{(n_{w} \times 10)}
where n_{wsy=1} = Nwsy1 = the number of one-syllable words. The scaling by 150
arises because the original FORCAST index is based on just a sample of 150 words.
"FORCAST.RGL":FORCAST.RGL (Caylor and Sticht 1973).
20.43 - 0.11 \times \frac{n_{wsy=1} \times
  150)}{(n_{w} \times 10)}
where n_{wsy=1} = Nwsy1 = the number of one-syllable words. The scaling by 150 arises
because the original FORCAST index is based on just a sample of 150 words.
"Fucks":Fucks' (1955) Stilcharakteristik (Style Characteristic).
AWL * ASL
"Linsear.Write":Linsear Write (Klare 1975).
\frac{[(100 - (\frac{100 \times n_{wsy<3}}{n_{w}})) +
  (3 \times \frac{100 \times n_{wsy>=3}}{n_{w}})]}{(100 \times
  \frac{n_{st}}{n_{w}})}
where n_{wsy<3} = Nwless3sy = the number of words with less than 3 syllables, and
n_{wsy>=3} = Nwmin3sy = the number of words with 3-syllables or more. The scaling
by 100 arises because the original Linsear.Write measure is based on just a sample of 100 words)
"LIW":Björnsson's (1968) Läsbarhetsindex (For Swedish Texts).
ASL + \frac{100 \times n_{wsy>=7}}{n_{w}}
where n_{wsy>=7} = Nwmin7sy = the number of words with 7-syllables or more. The scaling
by 100 arises because the Läsbarhetsindex index is based on just a sample of 100 words)
"nWS":Neue Wiener Sachtextformeln 1 (Bamberger and Vanecek 1984).
19.35 \times \frac{n_{wsy>=3}}{n_{w}} +
  0.1672 \times ASL + 12.97 \times \frac{b_{wchar>=6}}{n_{w}} - 3.27 \times
   \frac{n_{wsy=1}}{n_{w}} - 0.875
where n_{wsy>=3} = Nwmin3sy = the number of words with 3 syllables or more,
n_{wchar>=6} = Nwmin6char = the number of words with 6 characters or more, and
n_{wsy=1} = Nwsy1 = the number of one-syllable words.
"nWS.2":Neue Wiener Sachtextformeln 2 (Bamberger and Vanecek 1984).
20.07 \times \frac{n_{wsy>=3}}{n_{w}} + 0.1682 \times ASL +
  13.73 \times \frac{n_{wchar>=6}}{n_{w}} - 2.779
where n_{wsy>=3} = Nwmin3sy = the number of words with 3 syllables or more, and
n_{wchar>=6} = Nwmin6char = the number of words with 6 characters or more.
"nWS.3":Neue Wiener Sachtextformeln 3 (Bamberger and Vanecek 1984).
29.63 \times \frac{n_{wsy>=3}}{n_{w}} + 0.1905 \times
  ASL - 1.1144
where n_{wsy>=3} = Nwmin3sy = the number of words with 3 syllables or more.
"nWS.4":Neue Wiener Sachtextformeln 4 (Bamberger and Vanecek 1984).
27.44 \times \frac{n_{wsy>=3}}{n_{w}} + 0.2656 \times
  ASL - 1.693
where n_{wsy>=3} = Nwmin3sy = the number of words with 3 syllables or more.
"RIX":Anderson's (1983) Readability Index.
  \frac{n_{wsy>=7}}{n_{st}}
where n_{wsy>=7} = Nwmin7sy = the number of words with 7-syllables or more.
"Scrabble":Scrabble Measure.
Mean
  Scrabble Letter Values of All Words
. Scrabble values are for English. There is no reference for this, as we created it experimentally. It's not part of any accepted readability index!
"SMOG":Simple Measure of Gobbledygook (SMOG) (McLaughlin 1969).
 1.043
   \times \sqrt{n_{wsy>=3}} \times \frac{30}{n_{st}} + 3.1291
where n_{wsy>=3} = Nwmin3sy = the number of words with 3 syllables or more.
This measure is regression equation D in McLaughlin's original paper.
"SMOG.C":SMOG (Regression Equation C) (McLaughlin's 1969)
0.9986 \times
  \sqrt{Nwmin3sy \times \frac{30}{n_{st}} +
  5} +  2.8795
where n_{wsy>=3} = Nwmin3sy = the number of words with 3 syllables or more.
This measure is regression equation C in McLaughlin's original paper.
"SMOG.simple":Simplified Version of McLaughlin's (1969) SMOG Measure.
  \sqrt{Nwmin3sy \times \frac{30}{n_{st}}} +
  3
"SMOG.de":Adaptation of McLaughlin's (1969) SMOG Measure for German Texts.
 \sqrt{Nwmin3sy \times \frac{30}{n_{st}}-2}
"Spache":Spache's (1952) Readability Measure.
 0.121 \times
  ASL + 0.082 \times \frac{n_{wnotinspache}}{n_{w}}  +
  0.659
where n_{wnotinspache} = Nwnotinspache = number of unique words not in the Spache word list.
"Spache.old":Spache's (1952) Readability Measure (Old).
0.141
  \times ASL + 0.086 \times \frac{n_{wnotinspache}}{n_{w}}  +
  0.839
where n_{wnotinspache} = Nwnotinspache = number of unique words not in the Spache word list.
"Strain":Strain Index (Solomon 2006).
n_{sy} /
  \frac{n_{st}}{3} /10
The scaling by 3 arises because the original Strain index is based on just the first 3 sentences.
"Traenkle.Bailer":Tränkle & Bailer's (1984) Readability Measure 1.
224.6814 - (79.8304 \times AWL) - (12.24032 \times
  ASL) - (1.292857 \times 100 \times \frac{n_{prep}}{n_{w}}
where n_{prep} = Nprep = the number of prepositions. The scaling by 100 arises because the original
Tränkle & Bailer index is based on just a sample of 100 words.
"Traenkle.Bailer2":Tränkle & Bailer's (1984) Readability Measure 2.
Tränkle.Bailer2 =  234.1063 - (96.11069 \times AWL
  ) - (2.05444 \times 100 \times \frac{n_{prep}}{n_{w}}) -
  (1.02805 \times 100 \times \frac{n_{conj}}{n_{w}}
where n_{prep} = Nprep = the number of prepositions,
n_{conj} = Nconj = the number of conjunctions,
The scaling by 100 arises because the original Tränkle & Bailer index is based on
just a sample of 100 words)
"Wheeler.Smith":Wheeler & Smith's (1954) Readability Measure.
 ASL \times 10 \times \frac{n_{wsy>=2}}{n_{words}}
where n_{wsy>=2} = Nwmin2sy = the number of words with 2 syllables or more.
"meanSentenceLength":Average Sentence Length (ASL).
\frac{n_{w}}{n_{st}}
"meanWordSyllables":Average Word Syllables (AWL).
\frac{n_{sy}}{n_{w}}
textstat_readability returns a data.frame of documents and
their readability scores.
Kenneth Benoit, re-engineered from Meik Michalke's koRpus package.
Anderson, J. (1983). Lix and rix: Variations on a little-known readability
index. Journal of Reading, 26(6),
490–496.  https://www.jstor.org/stable/40031755
Bamberger, R. & Vanecek, E. (1984). Lesen-Verstehen-Lernen-Schreiben. Wien: Jugend und Volk.
Björnsson, C. H. (1968). Läsbarhet. Stockholm: Liber.
Bormuth, J.R. (1969). Development of Readability Analysis.
Bormuth, J.R. (1968). Cloze test readability: Criterion reference
scores. Journal of educational
measurement, 5(3), 189–196. https://www.jstor.org/stable/1433978
Caylor, J.S. (1973). Methodologies for Determining Reading Requirements of
Military Occupational Specialities.  https://eric.ed.gov/?id=ED074343
Caylor, J.S. & Sticht, T.G. (1973). Development of a Simple Readability
Index for Job Reading Material
https://archive.org/details/ERIC_ED076707
Coleman, E.B. (1971). Developing a technology of written instruction: Some determiners of the complexity of prose. Verbal learning research and the technology of written instruction, 155–204.
Coleman, M. & Liau, T.L. (1975). A Computer Readability Formula Designed for Machine Scoring. Journal of Applied Psychology, 60(2), 283. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1037/h0076540")}
Dale, E. and Chall, J.S. (1948). A Formula for Predicting Readability:
Instructions.  Educational Research
Bulletin, 37-54.  https://www.jstor.org/stable/1473169
Chall, J.S. and Dale, E. (1995). Readability Revisited: The New Dale-Chall Readability Formula. Brookline Books.
Dickes, P. & Steiwer, L. (1977). Ausarbeitung von Lesbarkeitsformeln für die Deutsche Sprache. Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie 9(1), 20–28.
Danielson, W.A., & Bryan, S.D. (1963). Computer Automation of Two Readability Formulas. Journalism Quarterly, 40(2), 201–206. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1177/107769906304000207")}
DuBay, W.H. (2004). The Principles of Readability.
Fang, I. E. (1966). The "Easy listening formula". Journal of Broadcasting & Electronic Media, 11(1), 63–68. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/08838156609363529")}
Farr, J. N., Jenkins, J.J., & Paterson, D.G. (1951). Simplification of Flesch Reading Ease Formula. Journal of Applied Psychology, 35(5): 333. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1037/h0057532")}
Flesch, R. (1948). A New Readability Yardstick. Journal of Applied Psychology, 32(3), 221. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1037/h0057532")}
Fucks, W. (1955). Der Unterschied des Prosastils von Dichtern und anderen Schriftstellern. Sprachforum, 1, 233-244.
Gunning, R. (1952). The Technique of Clear Writing. New York: McGraw-Hill.
Klare, G.R. (1975). Assessing Readability. Reading Research Quarterly, 10(1), 62-102. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.2307/747086")}
Kincaid, J. P., Fishburne Jr, R.P., Rogers, R.L., & Chissom, B.S. (1975). Derivation of New Readability Formulas (Automated Readability Index, FOG count and Flesch Reading Ease Formula) for Navy Enlisted Personnel.
McLaughlin, G.H. (1969). SMOG Grading: A New Readability Formula. Journal of Reading, 12(8), 639-646.
Michalke, M. (2014). koRpus: An R Package for Text Analysis (Version 0.05-4). Available from https://reaktanz.de/?c=hacking&s=koRpus.
Powers, R.D., Sumner, W.A., and Kearl, B.E. (1958). A Recalculation of Four Adult Readability Formulas. Journal of Educational Psychology, 49(2), 99. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1037/h0043254")}
Senter, R. J., & Smith, E. A. (1967). Automated readability index. Wright-Patterson Air Force Base. Report No. AMRL-TR-6620.
*Solomon, N. W. (2006). Qualitative Analysis of Media Language. India.
Spache, G. (1953). "A new readability formula for primary-grade reading
materials." The Elementary School Journal, 53, 410–413.
https://www.jstor.org/stable/998915
Tränkle, U. & Bailer, H. (1984). Kreuzvalidierung und Neuberechnung von Lesbarkeitsformeln für die deutsche Sprache. Zeitschrift für Entwicklungspsychologie und Pädagogische Psychologie, 16(3), 231–244.
Wheeler, L.R. & Smith, E.H. (1954). A Practical Readability Formula for the
Classroom Teacher in the Primary Grades. Elementary English, 31,
397–399.  https://www.jstor.org/stable/41384251
*Nimaldasan is the pen name of N. Watson Solomon, Assistant Professor of Journalism, School of Media Studies, SRM University, India.
txt <- c(doc1 = "Readability zero one. Ten, Eleven.",
         doc2 = "The cat in a dilapidated tophat.")
textstat_readability(txt, measure = "Flesch")
textstat_readability(txt, measure = c("FOG", "FOG.PSK", "FOG.NRI"))
textstat_readability(quanteda::data_corpus_inaugural[48:58],
                     measure = c("Flesch.Kincaid", "Dale.Chall.old"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.