R/textstat_readability.R

Defines functions textstat_readability.character textstat_readability.corpus textstat_readability.default textstat_readability

Documented in textstat_readability

#' Calculate readability
#'
#' Calculate the readability of text(s) using one of a variety of computed
#' indexes.
#' @details
#' The following readability formulas have been implemented, where
#' \itemize{
#'   \item Nw = \eqn{n_{w}} = number of words
#'   \item Nc = \eqn{n_{c}} = number of characters
#'   \item Nst = \eqn{n_{st}} = number of sentences
#'   \item Nsy = \eqn{n_{sy}} = number of syllables
#'   \item Nwf = \eqn{n_{wf}} = number of words matching the Dale-Chall List
#'         of 3000 "familiar words"
#'   \item ASL = Average Sentence Length: number of words / number of sentences
#'   \item AWL = Average Word Length: number of characters / number of words
#'   \item AFW = Average Familiar Words: count of words matching the Dale-Chall
#'         list of 3000 "familiar words" / number of all words
#'   \item Nwd = \eqn{n_{wd}} = number of "difficult" words not matching the
#'         Dale-Chall list of "familiar" words
#' }
#'
#' \describe{
#'   \item{`"ARI"`:}{Automated Readability Index (Senter and Smith 1967)
#'   \deqn{0.5 ASL  + 4.71 AWL - 21.34}}
#'
#'   \item{`"ARI.Simple"`:}{A simplified version of Senter and Smith's (1967) Automated Readability Index.
#'   \deqn{ASL + 9 AWL}}
#'
#'   \item{`"Bormuth.MC"`:}{Bormuth's (1969) Mean Cloze Formula.
#'   \deqn{0.886593 - 0.03640 \times AWL + 0.161911 \times AFW  - 0.21401 \times
#'   ASL - 0.000577 \times ASL^2 - 0.000005 \times ASL^3}{
#'   0.886593 - 0.03640 * AWL + 0.161911 * AFW  - 0.21401 *
#'   ASL - 0.000577 * ASL^2 - 0.000005 * ASL^3}}
#'
#'   \item{`"Bormuth.GP"`:}{Bormuth's (1969) Grade Placement score.
#'   \deqn{4.275 + 12.881M - 34.934M^2 + 20.388 M^3 + 26.194 CCS -
#'   2.046 CCS^2 - 11.767 CCS^3 - 42.285(M \times CCS) + 97.620(M \times CCS)^2 -
#'   59.538(M \times CCS)^2}{
#'   4.275 + 12.881M - 34.934M^2 + 20.388 M^3 + 26.194 CCS -
#'   2.046 CCS^2 - 11.767 CCS^3 - 42.285(M * CCS) + 97.620(M * CCS)^2 -
#'   59.538(M * CCS)^2}
#'   where \eqn{M} is the Bormuth Mean Cloze Formula as in
#'   `"Bormuth"` above, and \eqn{CCS} is the Cloze Criterion Score (Bormuth,
#'   1968).}
#'
#'   \item{`"Coleman"`:}{Coleman's (1971) Readability Formula 1.
#'   \deqn{1.29 \times \frac{100 \times n_{wsy=1}}{n_{w}} - 38.45}{
#'   1.29 * (100 * Nwsy1 / Nw) - 38.45}
#'
#'   where \eqn{n_{wsy=1}} = Nwsy1 = the number of one-syllable words.  The
#'   scaling by 100 in this and the other Coleman-derived measures arises
#'   because the Coleman measures are calculated on a per 100 words basis.}
#'
#'   \item{`"Coleman.C2"`:}{Coleman's (1971) Readability Formula 2.
#'   \deqn{1.16 \times \frac{100 \times n_{wsy=1}}{
#'   Nw + 1.48 \times \frac{100 \times n_{st}}{n_{w}} - 37.95}}{
#'   1.16 * (100 * Nwsy1 / Nw) + 1.48 * (100 * Nst / Nw) - 37.95}}
#'
#'   \item{`"Coleman.Liau.ECP"`:}{Coleman-Liau Estimated Cloze Percent
#'   (ECP) (Coleman and Liau 1975).
#'   \deqn{141.8401 - 0.214590 \times 100
#'   \times AWL + 1.079812 \times \frac{n_{st} \times 100}{n_{w}}}{
#'   141.8401 - (0.214590 * 100 * AWL) + (1.079812 * Nst * 100 / Nw)}}
#'
#'   \item{`"Coleman.Liau.grade"`:}{Coleman-Liau Grade Level (Coleman
#'   and Liau 1975).
#'   \deqn{-27.4004 \times \mathtt{Coleman.Liau.ECP} \times 100 +
#'   23.06395}{-27.4004 * Coleman.Liau.ECP / 100 + 23.06395}}
#'
#'   \item{`"Coleman.Liau.short"`:}{Coleman-Liau Index (Coleman and Liau 1975).
#'   \deqn{5.88 \times AWL + 29.6 \times \frac{n_{st}}{n_{w}} - 15.8}{
#'   5.88 * AWL + (0.296 * Nst / Nw) - 15.8}}
#'
#'   \item{`"Dale.Chall"`:}{The New Dale-Chall Readability formula (Chall
#'   and Dale 1995).
#'   \deqn{64 - (0.95 \times 100 \times \frac{n_{wd}}{n_{w}}) - (0.69 \times ASL)}{
#'   64 - (0.95 * 100 * Nwd / Nw) - (0.69 * ASL)}}
#'
#'   \item{`"Dale.Chall.Old"`:}{The original Dale-Chall Readability formula
#'   (Dale and Chall (1948).
#'   \deqn{0.1579 \times 100 \times \frac{n_{wd}}{n_{w}} + 0.0496 \times ASL [+ 3.6365]}{
#'   0.1579 * 100 * Nwd / Nw  + 0.0496 * ASL [+ 3.6365]}
#'
#'   The additional constant 3.6365 is only added if (Nwd / Nw) > 0.05.}
#'
#'   \item{`"Dale.Chall.PSK"`:}{The Powers-Sumner-Kearl Variation of the
#'   Dale and Chall Readability formula (Powers, Sumner and Kearl, 1958).
#'   \deqn{0.1155 \times
#'   100 \frac{n_{wd}}{n_{w}}) + (0.0596 \times ASL) + 3.2672 }{
#'   (0.1155 * 100 * Nwd / Nw) + (0.0596 * ASL) + 3.2672}}
#'
#'   \item{`"Danielson.Bryan"`:}{Danielson-Bryan's (1963) Readability Measure 1. \deqn{
#'   (1.0364 \times \frac{n_{c}}{n_{blank}}) +
#'   (0.0194 \times \frac{n_{c}}{n_{st}}) -
#'   0.6059}{(1.0364 * Nc / Nblank) +
#'   (0.0194 * Nc / Nst) - 0.6059}
#'
#'   where \eqn{n_{blank}} = Nblank = the number of blanks.}
#'
#'   \item{`"Danielson.Bryan2"`:}{Danielson-Bryan's (1963) Readability Measure 2. \deqn{
#'   131.059- (10.364 \times \frac{n_{c}}{n_{blank}}) + (0.0194
#'    \times \frac{n_{c}}{n_{st}})}{131.059 - (10.364 * Nc /
#'    Nblank) + (0.0194 * Nc / Nst)}
#'
#'    where \eqn{n_{blank}} = Nblank = the number of blanks.}
#'
#'   \item{`"Dickes.Steiwer"`:}{Dickes-Steiwer Index (Dicks and Steiwer 1977). \deqn{
#'   235.95993 - (7.3021 \times AWL)  - (12.56438 \times ASL) -
#'   (50.03293 \times TTR)}{235.95993 - (73.021 *
#'   AWL) - (12.56438 * ASL) - (50.03293 * TTR)}
#'
#'   where TTR is the Type-Token Ratio (see [textstat_lexdiv()])}
#'
#'   \item{`"DRP"`:}{Degrees of Reading Power. \deqn{(1 - Bormuth.MC) *
#'   100}
#'
#'   where Bormuth.MC refers to Bormuth's (1969)  Mean Cloze Formula (documented above)}
#'
#'   \item{`"ELF"`:}{Easy Listening Formula (Fang 1966): \deqn{\frac{n_{wsy>=2}}{n_{st}}}{(Nwmin2sy / Nst)}
#'
#'    where \eqn{n_{wsy>=2}} = Nwmin2sy = the number of words with 2 syllables or more.}
#'
#'   \item{`"Farr.Jenkins.Paterson"`:}{Farr-Jenkins-Paterson's
#'   Simplification of Flesch's Reading Ease Score (Farr, Jenkins and Paterson 1951). \deqn{
#'    -31.517 - (1.015 \times ASL) + (1.599 \times
#'   \frac{n_{wsy=1}}{n_{w}}}{ -31.517
#'   - (1.015 * ASL) + (1.599 * Nwsy1 / Nw)}
#'
#'   where \eqn{n_{wsy=1}} = Nwsy1 = the number of one-syllable words.}
#'
#'   \item{`"Flesch"`:}{Flesch's Reading Ease Score (Flesch 1948).
#'   \deqn{206.835 - (1.015 \times ASL) - (84.6 \times \frac{n_{sy}}{n_{w}})}{
#'   206.835 - (1.015 * ASL) - (84.6 * (Nsy / Nw))}}
#'
#'   \item{`"Flesch.PSK"`:}{The Powers-Sumner-Kearl's Variation of Flesch Reading Ease Score
#'   (Powers, Sumner and Kearl, 1958). \deqn{ (0.0778 \times
#'   ASL) + (4.55 \times \frac{n_{sy}}{n_{w}}) -
#'   2.2029}{(0.0078 * ASL) + (4.55 * Nsy / Nw) - 2.2029}}
#'
#'   \item{`"Flesch.Kincaid"`:}{Flesch-Kincaid Readability Score (Flesch and Kincaid 1975). \deqn{
#'   0.39 \times ASL + 11.8  \times \frac{n_{sy}}{n_{w}} -
#'   15.59}{0.39 * ASL + 11.8  * (NSy /Nw) - 15.59}}
#'
#'   \item{`"FOG"`:}{Gunning's Fog Index (Gunning 1952). \deqn{0.4
#'   \times (ASL + 100 \times \frac{n_{wsy>=3}}{n_{w}})}{0.4 *
#'   (ASL + 100 * (Nwmin3sy / Nw)}
#'
#'   where \eqn{n_{wsy>=3}} = Nwmin3sy = the number of words with 3-syllables or more.
#'   The scaling by 100 arises because the original FOG index is based on
#'   just a sample of 100 words)}
#'
#'   \item{`"FOG.PSK"`:}{The Powers-Sumner-Kearl Variation of Gunning's
#'   Fog Index (Powers, Sumner and Kearl, 1958). \deqn{3.0680 \times
#'   (0.0877 \times ASL) +(0.0984 \times 100 \times \frac{n_{wsy>=3}}{n_{w}})}{
#'   3.0680 * (0.0877 * ASL) +(0.0984 * 100 * (Nwmin3sy / Nw)}
#'
#'   where \eqn{n_{wsy>=3}} = Nwmin3sy = the number of words with 3-syllables or more.
#'   The scaling by 100 arises because the original FOG index is based on
#'   just a sample of 100 words)}
#'
#'   \item{`"FOG.NRI"`:}{The Navy's Adaptation of Gunning's Fog Index (Kincaid, Fishburne, Rogers and Chissom 1975).
#'   \deqn{(\frac{(n_{wsy<3} + 3 \times n_{wsy=3})}{(100 \times \frac{N_{st}}{N_{w}})}  -
#'   3) / 2 }{(((Nwless3sy + 3 * Nw3sy) / (100 * Nst / Nw))-3) / 2}
#'
#'   where \eqn{n_{wsy<3}} = Nwless3sy = the number of words with *less than* 3 syllables, and
#'   \eqn{n_{wsy=3}} = Nw3sy = the number of 3-syllable words. The scaling by 100
#'   arises because the original FOG index is based on just a sample of 100 words)}
#'
#'   \item{`"FORCAST"`:}{FORCAST (Simplified Version of FORCAST.RGL) (Caylor and
#'   Sticht 1973). \deqn{ 20 - \frac{n_{wsy=1} \times
#'   150)}{(n_{w} \times 10)}}{ 20 - (Nwsy1 *
#'   150) / (Nw * 10)}
#'
#'   where \eqn{n_{wsy=1}} = Nwsy1 = the number of one-syllable words. The scaling by 150
#'   arises because the original FORCAST index is based on just a sample of 150 words.}
#'
#'   \item{`"FORCAST.RGL"`:}{FORCAST.RGL (Caylor and Sticht 1973).
#'   \deqn{20.43 - 0.11 \times \frac{n_{wsy=1} \times
#'   150)}{(n_{w} \times 10)}}{ 20.43 - 0.11 * (Nwsy1 *
#'   150) / (Nw * 10)}
#'
#'   where \eqn{n_{wsy=1}} = Nwsy1 = the number of one-syllable words. The scaling by 150 arises
#'   because the original FORCAST index is based on just a sample of 150 words.}
#'
#'   \item{`"Fucks"`:}{Fucks' (1955) Stilcharakteristik (Style
#'   Characteristic). \deqn{AWL * ASL}}
#'
#'   \item{`"Linsear.Write"`:}{Linsear Write (Klare 1975).
#'   \deqn{\frac{[(100 - (\frac{100 \times n_{wsy<3}}{n_{w}})) +
#'   (3 \times \frac{100 \times n_{wsy>=3}}{n_{w}})]}{(100 \times
#'   \frac{n_{st}}{n_{w}})}}{[(100 - (100 * Nwless3sy / Nw))
#'   + (3 * 100 * Nwmin3sy / Nw)] / (100 * Nst / Nw)}
#'
#'   where \eqn{n_{wsy<3}} = Nwless3sy = the number of words with *less than* 3 syllables, and
#'   \eqn{n_{wsy>=3}} = Nwmin3sy = the number of words with 3-syllables or more. The scaling
#'   by 100 arises because the original Linsear.Write measure is based on just a sample of 100 words)}
#'
#'
#'   \item{`"LIW"`:}{Björnsson's (1968) Läsbarhetsindex (For Swedish
#'   Texts). \deqn{ASL + \frac{100 \times n_{wsy>=7}}{n_{w}}}{ ASL + (100 *
#'   Nwmin7sy / Nw)}
#'
#'   where \eqn{n_{wsy>=7}} = Nwmin7sy = the number of words with 7-syllables or more. The scaling
#'   by 100 arises because the Läsbarhetsindex index is based on just a sample of 100 words)}
#'
#'   \item{`"nWS"`:}{Neue Wiener Sachtextformeln 1 (Bamberger and
#'   Vanecek 1984). \deqn{19.35 \times \frac{n_{wsy>=3}}{n_{w}} +
#'   0.1672 \times ASL + 12.97 \times \frac{b_{wchar>=6}}{n_{w}} - 3.27 \times
#'    \frac{n_{wsy=1}}{n_{w}} - 0.875}{(19.35 * Nwmin3sy / Nw) +
#'   (0.1672 * ASL) + (12.97 * Nwmin6char / Nw) - (3.27 * Nw1sy / Nw)- 0.875}
#'
#'   where \eqn{n_{wsy>=3}} = Nwmin3sy = the number of words with 3 syllables or more,
#'   \eqn{n_{wchar>=6}} = Nwmin6char = the number of words with 6 characters or more, and
#'   \eqn{n_{wsy=1}} = Nwsy1 = the number of one-syllable words.}
#'
#'   \item{`"nWS.2"`:}{Neue Wiener Sachtextformeln 2 (Bamberger and
#'   Vanecek 1984). \deqn{20.07 \times \frac{n_{wsy>=3}}{n_{w}} + 0.1682 \times ASL +
#'   13.73 \times \frac{n_{wchar>=6}}{n_{w}} - 2.779}{ (20.07 * Nwmin3sy / Nw) + (0.1682 * ASL) +
#'   (13.73 * Nwmin6char / Nw) - 2.779}
#'
#'   where \eqn{n_{wsy>=3}} = Nwmin3sy = the number of words with 3 syllables or more, and
#'   \eqn{n_{wchar>=6}} = Nwmin6char = the number of words with 6 characters or more.}
#'
#'   \item{`"nWS.3"`:}{Neue Wiener Sachtextformeln 3 (Bamberger and
#'   Vanecek 1984). \deqn{29.63 \times \frac{n_{wsy>=3}}{n_{w}} + 0.1905 \times
#'   ASL - 1.1144}{(29.63 * Nwmin3sy / Nw) + (0.1905 * ASL) - 1.1144}
#'
#'   where \eqn{n_{wsy>=3}} = Nwmin3sy = the number of words with 3 syllables or more.}
#'
#'   \item{`"nWS.4"`:}{Neue Wiener Sachtextformeln 4 (Bamberger and
#'   Vanecek 1984). \deqn{27.44 \times \frac{n_{wsy>=3}}{n_{w}} + 0.2656 \times
#'   ASL - 1.693}{ (27.44 * Nwmin3sy / Nw) + (0.2656 * ASL) - 1.693}
#'
#'   where \eqn{n_{wsy>=3}} = Nwmin3sy = the number of words with 3 syllables or more.}
#'
#'   \item{`"RIX"`:}{Anderson's (1983) Readability Index. \deqn{
#'   \frac{n_{wsy>=7}}{n_{st}}}{ Nwmin7sy / Nst}
#'
#'   where \eqn{n_{wsy>=7}} = Nwmin7sy = the number of words with 7-syllables or more.}
#'
#'   \item{`"Scrabble"`:}{Scrabble Measure. \deqn{Mean
#'   Scrabble Letter Values of All Words}.
#'   Scrabble values are for English.  There is no reference for this, as we
#'   created it experimentally.  It's not part of any accepted readability
#'   index!}
#'
#'   \item{`"SMOG"`:}{Simple Measure of Gobbledygook (SMOG) (McLaughlin 1969). \deqn{ 1.043
#'    \times \sqrt{n_{wsy>=3}} \times \frac{30}{n_{st}} + 3.1291}{1.043 * sqrt(Nwmin3sy
#'    * 30 / Nst) + 3.1291}
#'
#'   where \eqn{n_{wsy>=3}} = Nwmin3sy = the number of words with 3 syllables or more.
#'   This measure is regression equation D in McLaughlin's original paper.}
#'
#'   \item{`"SMOG.C"`:}{SMOG (Regression Equation C) (McLaughlin's 1969) \deqn{0.9986 \times
#'   \sqrt{Nwmin3sy \times \frac{30}{n_{st}} +
#'   5} +  2.8795}{0.9986 * sqrt(Nwmin3sy * (30 / Nst) +
#'   5) + 2.8795}
#'
#'   where \eqn{n_{wsy>=3}} = Nwmin3sy = the number of words with 3 syllables or more.
#'   This measure is regression equation C in McLaughlin's original paper.}
#'
#'   \item{`"SMOG.simple"`:}{Simplified Version of McLaughlin's (1969) SMOG Measure. \deqn{
#'   \sqrt{Nwmin3sy \times \frac{30}{n_{st}}} +
#'   3}{sqrt(Nwmin3sy * 30 / Nst) + 3}}
#'
#'   \item{`"SMOG.de"`:}{Adaptation of McLaughlin's (1969) SMOG Measure for German Texts.
#'   \deqn{ \sqrt{Nwmin3sy \times \frac{30}{n_{st}}-2}}{
#'   sqrt(Nwmin3sy * 30 / Nst) - 2 }}
#'
#'   \item{`"Spache"`:}{Spache's (1952) Readability Measure. \deqn{ 0.121 \times
#'   ASL + 0.082 \times \frac{n_{wnotinspache}}{n_{w}}  +
#'   0.659}{ 0.121 * ASL + 0.082 * Nwnotinspache / Nw) + 0.659}
#'
#'   where \eqn{n_{wnotinspache}} = Nwnotinspache = number of unique words not in the Spache word list.}
#'
#'   \item{`"Spache.old"`:}{Spache's (1952) Readability Measure (Old). \deqn{0.141
#'   \times ASL + 0.086 \times \frac{n_{wnotinspache}}{n_{w}}  +
#'   0.839}{0.141 * ASL + 0.086 * (Nwnotinspache/ Nw) + 0.839}
#'
#'   where \eqn{n_{wnotinspache}} = Nwnotinspache = number of unique words not in the Spache word list.}
#'
#'   \item{`"Strain"`:}{Strain Index (Solomon 2006). \deqn{n_{sy} /
#'   \frac{n_{st}}{3} /10}{Nsy / (Nst / 3) / 10 }
#'
#'   The scaling by 3 arises because the original Strain index is based on just the first 3 sentences.}
#'
#'   \item{`"Traenkle.Bailer"`:}{Tränkle & Bailer's (1984) Readability Measure 1.
#'   \deqn{224.6814 - (79.8304 \times AWL) - (12.24032 \times
#'   ASL) - (1.292857 \times 100 \times \frac{n_{prep}}{n_{w}}}{ 224.6814 - (79.8304 * AWL) + (12.24032 * ASL) -
#'   (1.292857 * 100 * Nprep / Nw)}
#'
#'   where \eqn{n_{prep}} = Nprep = the number of prepositions. The scaling by 100 arises because the original
#'   Tränkle & Bailer index is based on just a sample of 100 words.}
#'
#'   \item{`"Traenkle.Bailer2"`:}{Tränkle & Bailer's (1984) Readability Measure 2.
#'   \deqn{Tränkle.Bailer2 =  234.1063 - (96.11069 \times AWL
#'   ) - (2.05444 \times 100 \times \frac{n_{prep}}{n_{w}}) -
#'   (1.02805 \times 100 \times \frac{n_{conj}}{n_{w}}}{
#'   234.1063 - 96.11069 * AWL  - 2.05444 * 100 * (Nprep / Nw) - 1.02805 * 100 * (Nconj / Nw).}
#'
#'   where \eqn{n_{prep}} = Nprep = the number of prepositions,
#'   \eqn{n_{conj}} = Nconj = the number of conjunctions,
#'   The scaling by 100 arises because the original Tränkle & Bailer index is based on
#'   just a sample of 100 words)}
#'
#'   \item{`"Wheeler.Smith"`:}{Wheeler & Smith's (1954) Readability Measure.
#'   \deqn{ ASL \times 10 \times \frac{n_{wsy>=2}}{n_{words}}}{ ASL * 10 * (Nwmin2sy / Nw)}
#'
#'    where \eqn{n_{wsy>=2}} = Nwmin2sy = the number of words with 2 syllables or more.}
#'
#'   \item{`"meanSentenceLength"`:}{Average Sentence Length (ASL).
#'   \deqn{\frac{n_{w}}{n_{st}}}{ Nw / Nst }}
#'
#'   \item{`"meanWordSyllables"`:}{Average Word Syllables (AWL).
#'   \deqn{\frac{n_{sy}}{n_{w}}}{ Nsy / Nw}}
#'
#' }
#'
#' @param x a character or [corpus] object containing the texts
#' @param measure character vector defining the readability measure to calculate.
#'   Matches are case-insensitive.  See other valid measures under Details.
#' @param remove_hyphens if `TRUE`, treat constituent words in hyphenated as
#'   separate terms, for purposes of computing word lengths, e.g.
#'   "decision-making" as two terms of lengths 8 and 6 characters respectively,
#'   rather than as a single word of 15 characters
#' @param min_sentence_length,max_sentence_length set the minimum and maximum
#'   sentence lengths (in tokens, excluding punctuation) to include in the
#'   computation of readability.  This makes it easy to exclude "sentences" that
#'   may not really be sentences, such as section titles, table elements, and
#'   other cruft that might be in the texts following conversion.
#'
#'   For finer-grained control, consider filtering sentences prior first,
#'   including through pattern-matching, using [corpus_trim()].
#' @param intermediate if `TRUE`, include intermediate quantities in the output
#' @param ... not used
#' @importFrom quanteda texts char_trim nsentence char_tolower tokens_remove dfm
#' @importFrom nsyllable nsyllable
#' @author Kenneth Benoit, re-engineered from Meik Michalke's \pkg{koRpus}
#'   package.
#' @return `textstat_readability` returns a data.frame of documents and
#'   their readability scores.
#' @export
#' @examples
#' txt <- c(doc1 = "Readability zero one. Ten, Eleven.",
#'          doc2 = "The cat in a dilapidated tophat.")
#' textstat_readability(txt, measure = "Flesch")
#' textstat_readability(txt, measure = c("FOG", "FOG.PSK", "FOG.NRI"))
#'
#' textstat_readability(quanteda::data_corpus_inaugural[48:58],
#'                      measure = c("Flesch.Kincaid", "Dale.Chall.old"))
#' @references
#'   Anderson, J. (1983). Lix and rix: Variations on a little-known readability
#'   index. *Journal of Reading*, 26(6),
#'   490--496.  `https://www.jstor.org/stable/40031755`
#'
#'   Bamberger, R. & Vanecek, E. (1984). *Lesen-Verstehen-Lernen-Schreiben*.
#'   Wien: Jugend und Volk.
#'
#'   Björnsson, C. H. (1968). *Läsbarhet*. Stockholm: Liber.
#'
#'   Bormuth, J.R. (1969). [Development of Readability
#'   Analysis](https://files.eric.ed.gov/fulltext/ED029166.pdf).
#'
#'   Bormuth, J.R. (1968). Cloze test readability: Criterion reference
#'   scores. *Journal of educational
#'   measurement*, 5(3), 189--196. `https://www.jstor.org/stable/1433978`
#'
#'   Caylor, J.S. (1973). Methodologies for Determining Reading Requirements of
#'   Military Occupational Specialities.  `https://eric.ed.gov/?id=ED074343`
#'
#'   Caylor, J.S. & Sticht, T.G. (1973). *Development of a Simple Readability
#'   Index for Job Reading Material*
#'   `https://archive.org/details/ERIC_ED076707`
#'
#'   Coleman, E.B. (1971). Developing a technology of written instruction: Some
#'   determiners of the complexity of prose. *Verbal learning research and the
#'   technology of written instruction*, 155--204.
#'
#'   Coleman, M. & Liau, T.L. (1975). A Computer Readability Formula Designed
#'   for Machine Scoring. *Journal of Applied Psychology*, 60(2), 283.
#'   \doi{10.1037/h0076540}
#'
#'   Dale, E. and Chall, J.S. (1948). A Formula for Predicting Readability:
#'   Instructions.  *Educational Research
#'   Bulletin*, 37-54.  `https://www.jstor.org/stable/1473169`
#'
#'   Chall, J.S. and Dale, E. (1995). *Readability Revisited: The New Dale-Chall
#'   Readability Formula*. Brookline Books.
#'
#'   Dickes, P. & Steiwer, L. (1977). Ausarbeitung von Lesbarkeitsformeln für
#'   die Deutsche Sprache. *Zeitschrift für Entwicklungspsychologie und
#'   Pädagogische Psychologie* 9(1), 20--28.
#'
#'   Danielson, W.A., & Bryan, S.D. (1963). Computer Automation of Two
#'   Readability
#'   Formulas.
#'   *Journalism Quarterly*, 40(2), 201--206. \doi{10.1177/107769906304000207}
#'
#'   DuBay, W.H. (2004). [*The Principles of
#'   Readability*](https://files.eric.ed.gov/fulltext/ED490073.pdf).
#'
#'   Fang, I. E. (1966). The "Easy listening formula". *Journal of Broadcasting
#'   & Electronic Media*, 11(1), 63--68.  \doi{10.1080/08838156609363529}
#'
#'   Farr, J. N., Jenkins, J.J., & Paterson, D.G. (1951). Simplification of
#'   Flesch Reading Ease Formula. *Journal of Applied Psychology*, 35(5): 333.
#'   \doi{10.1037/h0057532}
#'
#'   Flesch, R. (1948). A New Readability Yardstick. *Journal of Applied
#'   Psychology*, 32(3), 221.  \doi{10.1037/h0057532}
#'
#'   Fucks, W. (1955). Der Unterschied des Prosastils von Dichtern und anderen
#'   Schriftstellern. *Sprachforum*, 1, 233-244.
#'
#'   Gunning, R. (1952). *The Technique of Clear Writing*.  New York:
#'   McGraw-Hill.
#'
#'   Klare, G.R. (1975). Assessing Readability.  *Reading Research Quarterly*,
#'   10(1), 62-102.  \doi{10.2307/747086}
#'
#'   Kincaid, J. P., Fishburne Jr, R.P., Rogers, R.L., & Chissom, B.S. (1975).
#'   [Derivation of New Readability Formulas (Automated Readability Index, FOG
#'   count and Flesch Reading Ease Formula) for Navy Enlisted
#'   Personnel](https://stars.library.ucf.edu/istlibrary/56/).
#'
#'   McLaughlin, G.H. (1969). [SMOG Grading: A New Readability
#'   Formula.](https://ogg.osu.edu/media/documents/health_lit/WRRSMOG_Readability_Formula_G._Harry_McLaughlin__1969_.pdf)
#'   *Journal of Reading*, 12(8), 639-646.
#'
#'   Michalke, M. (2014). *koRpus: An R Package for Text Analysis (Version 0.05-4)*.
#'   Available from <https://reaktanz.de/?c=hacking&s=koRpus>.
#'
#'   Powers, R.D., Sumner, W.A., and Kearl, B.E. (1958). A Recalculation of
#'   Four Adult Readability Formulas. *Journal of Educational Psychology*,
#'   49(2), 99.  \doi{10.1037/h0043254}
#'
#'   Senter, R. J., & Smith, E. A. (1967). [Automated readability
#'   index.](https://apps.dtic.mil/sti/pdfs/AD0667273.pdf)
#'   Wright-Patterson Air Force Base. Report No. AMRL-TR-6620.
#'
#'   *Solomon, N. W. (2006). *Qualitative Analysis of Media Language*. India.
#'
#'   Spache, G. (1953). "A new readability formula for primary-grade reading
#'   materials." *The Elementary School Journal*, 53, 410--413.
#'   `https://www.jstor.org/stable/998915`
#'
#'   Tränkle, U. & Bailer, H. (1984). Kreuzvalidierung und Neuberechnung von
#'   Lesbarkeitsformeln für die deutsche Sprache. *Zeitschrift für
#'   Entwicklungspsychologie und Pädagogische Psychologie*, 16(3), 231--244.
#'
#'   Wheeler, L.R. & Smith, E.H. (1954). A Practical Readability Formula for the
#'   Classroom Teacher in the Primary Grades. *Elementary English*, 31,
#'   397--399.  `https://www.jstor.org/stable/41384251`
#'
#'   *Nimaldasan is the pen name of N. Watson Solomon, Assistant Professor of
#'   Journalism, School of Media Studies, SRM University, India.
#'
textstat_readability <- function(x,
                                 measure = "Flesch",
                                 remove_hyphens = TRUE,
                                 min_sentence_length = 1,
                                 max_sentence_length = 10000,
                                 intermediate = FALSE, ...) {
    UseMethod("textstat_readability")
}

#' @export
textstat_readability.default <- function(x,
                                        measure = "Flesch",
                                        remove_hyphens = TRUE,
                                        min_sentence_length = 1,
                                        max_sentence_length = 10000,
                                        intermediate = FALSE, ...) {
    stop(friendly_class_undefined_message(class(x), "textstat_readability"))
}

#' @importFrom stringi stri_length
#' @export
textstat_readability.corpus <- function(x,
                                        measure = "Flesch",
                                        remove_hyphens = TRUE,
                                        min_sentence_length = 1,
                                        max_sentence_length = 10000,
                                        intermediate = FALSE, ...) {

    check_dots(...)

    measure_option <- c("ARI", "ARI.simple", "ARI.NRI",
                        "Bormuth", "Bormuth.MC", "Bormuth.GP",
                        "Coleman", "Coleman.C2",
                        "Coleman.Liau.ECP", "Coleman.Liau.grade", "Coleman.Liau.short",
                        "Dale.Chall", "Dale.Chall.old", "Dale.Chall.PSK",
                        "Danielson.Bryan", "Danielson.Bryan.2",
                        "Dickes.Steiwer", "DRP", "ELF", "Farr.Jenkins.Paterson",
                        "Flesch", "Flesch.PSK", "Flesch.Kincaid",
                        "FOG", "FOG.PSK", "FOG.NRI", "FORCAST", "FORCAST.RGL",
                        "Fucks", "Linsear.Write", "LIW",
                        "nWS", "nWS.2", "nWS.3", "nWS.4", "RIX",
                        "Scrabble",
                        "SMOG", "SMOG.C", "SMOG.simple", "SMOG.de",
                        "Spache", "Spache.old", "Strain",
                        "Traenkle.Bailer", "Traenkle.Bailer.2",
                        "Wheeler.Smith",
                        "meanSentenceLength",
                        "meanWordSyllables")
    accepted_measures <- c(measure_option, "Bormuth", "Coleman.Liau")

    if (measure[1] == "all") {
        measure <- measure_option
    } else {
        is_valid <- measure %in% accepted_measures
        if (!all(is_valid))
            stop("Invalid measure(s): ", measure[!is_valid])
    }

    if ("Bormuth" %in% measure) {
        measure[measure == "Bormuth"] <- "Bormuth.MC"
        measure <- unique(measure)
    }

    if ("Coleman.Liau" %in% measure) {
        measure[measure == "Coleman.Liau"] <- "Coleman.Liau.ECP"
        measure <- unique(measure)
    }

    x <- as.character(x)
    if (!is.null(min_sentence_length) || !is.null(max_sentence_length)) {
        temp <- char_trim(x, "sentences",
                          min_ntoken = min_sentence_length,
                          max_ntoken = max_sentence_length)
        x[names(temp)] <- temp
        x[!names(x) %in% names(temp)] <- ""
    }

    # get sentence lengths - BEFORE lower-casing
    n_sent <- ntoken(tokens(x, what = "sentence"))

    # get the word length and syllable info for use in computing quantities
    x <- char_tolower(x)
    toks <- tokens(x, remove_punct = TRUE, split_hyphens = remove_hyphens)

    # number of syllables
    n_syll <- nsyllable(toks)
    # replace any NAs with a single count (most of these will be numbers)
    n_syll <- lapply(n_syll, function(y) ifelse(is.na(y), 1, y))

    # lengths in characters of the words
    len_token <- lapply(toks, stri_length)

    # common statistics required by (nearly all) indexes
    W <- lengths(toks)  # number of words
    St <- n_sent        # number of sentences
    C <-  vapply(len_token, sum, numeric(1))                      # number of characters (letters)
    Sy <- vapply(n_syll, sum, numeric(1))                         # number of syllables
    W3Sy <- vapply(n_syll, function(x) sum(x >= 3), numeric(1))   # number words with >= 3 syllables
    W2Sy <- vapply(n_syll, function(x) sum(x >= 2), numeric(1))   # number words with >= 2 syllables
    W_1Sy <- vapply(n_syll, function(x) sum(x == 1), numeric(1))  # number words with 1 syllable
    W6C <- vapply(len_token, function(x) sum(x >= 6), numeric(1)) # number of words with at least 6 letters
    W7C <- vapply(len_token, function(x) sum(x >= 7), numeric(1)) # number of words with at least 7 letters
    Wlt3Sy <- W - W3Sy  # number of words with less than three syllables

    result <- data.frame(document = names(x), row.names = NULL, stringsAsFactors = FALSE)

    # look up D-C words if needed
    if (any(c("Dale.Chall", "Dale.Chall.old", "Dale.Chall.PSK", "Bormuth.MC", "Bormuth.GP", "DRP") %in% measure)) {
        W_wl.Dale.Chall <- lengths(tokens_remove(toks,
                                                 pattern = quanteda.textstats::data_char_wordlists$dalechall,
                                                 valuetype = "fixed",
                                                 case_insensitive = TRUE))
    }

    if ("ARI" %in% measure)
        result[["ARI"]] <- 0.5 * W / St + 4.71 * C / W - 21.43

    if ("ARI.NRI" %in% measure)
        result[["ARI.NRI"]] <- 0.4 * W / St + 6 * C / W - 27.4

    if ("ARI.simple" %in% measure)
        result[["ARI.simple"]] <- W / St + 9 * C / W

    if ("Bormuth.MC" %in% measure) {
        result[["Bormuth.MC"]] <- 0.886593 - (0.08364 * C / W) + 0.161911 * (W_wl.Dale.Chall / W) ^ 3 -
            0.21401 * (W / St) + 0.000577 * (W / St) ^ 2 - 0.000005 * (W / St) ^ 3
    }
    if ("Bormuth.GP" %in% measure) {
        CCS <- 35 # Cloze criterion score, percent as integer
        Bormuth.MC.Temp <- 0.886593 - (0.08364 * C / W) + 0.161911 *
                 (W_wl.Dale.Chall / W) ^ 3 - 0.21401 * (W / St) + 0.000577 *
                 (W / St) ^ 2 - 0.000005 * (W / St) ^ 3
        result[["Bormuth.GP"]] <- 4.275 +
                 12.881 * Bormuth.MC.Temp -
                 (34.934 * Bormuth.MC.Temp^2) +
                 (20.388 * Bormuth.MC.Temp^3) +
                 (26.194 * C - 2.046 * CCS ^ 2) - (11.767 * CCS ^ 3) -
                 (44.285 * Bormuth.MC.Temp * CCS) +
                 (97.620 * (Bormuth.MC.Temp * CCS)^2) -
                 (59.538 * (Bormuth.MC.Temp * CCS)^3)
    }

    if ("Coleman" %in% measure)
        result[["Coleman"]] <- 1.29 * (100 * W_1Sy / W) - 38.45

    if ("Coleman.C2" %in% measure)
        result[["Coleman.C2"]] <- 1.16 * (100 * W_1Sy / W) + 1.48 * (100 * St / W) - 37.95

    ## cannot compute Coleman.C3, Coleman.C4 without knowing the number of pronouns or prepositions

    if ("Coleman.Liau.ECP" %in% measure)
        result[["Coleman.Liau.ECP"]] <- 141.8401 - 0.214590 * (100 * C / W) + 1.079812 * (100 * St / W)

    if ("Coleman.Liau.grade" %in% measure) {
        Coleman.Liau.ECP.Temp <- 141.8401 - 0.214590 * (100 * C / W) + 1.079812 * (100 * St / W)
        result[["Coleman.Liau.grade"]] <- -27.4004 * Coleman.Liau.ECP.Temp / 100 + 23.06395
    }

    if ("Coleman.Liau.short" %in% measure)
        result[["Coleman.Liau.short"]] <- 5.88 * C / W - 29.6 * St / W - 15.8

    if ("Dale.Chall" %in% measure) {
        result[["Dale.Chall"]] <- 64 - 0.95 * 100 * W_wl.Dale.Chall / W - 0.69 * W / St
    }

    if ("Dale.Chall.old" %in% measure) {
        DC_constant <- NULL
        DC_constant <- ((W_wl.Dale.Chall / W) > .05) * 3.6365
        result[["Dale.Chall.old"]] <- 0.1579 * 100 * W_wl.Dale.Chall / W + 0.0496 * W / St + DC_constant
    }

    # Powers-Sumner-Kearl (1958) variation
    if ("Dale.Chall.PSK" %in% measure)
        result[["Dale.Chall.PSK"]] <- 0.1155 * 100 * W_wl.Dale.Chall / W + 0.0596 * W / St + 3.2672

    if ("Danielson.Bryan" %in% measure) {
        Bl <- W - 1  # could be more accurate if count spaces
        result[["Danielson.Bryan"]] <- (1.0364 * C / Bl) + (0.0194 * C / St) - 0.6059
    }

    if ("Danielson.Bryan.2" %in% measure) {
        Bl <- W - 1  # could be more accurate if count spaces
        result[["Danielson.Bryan.2"]] <- 131.059 - (10.364 * C / Bl) + (0.0194 * C / St)
    }

    if ("Dickes.Steiwer" %in% measure) {
        TTR <- textstat_lexdiv(dfm(tokens(x), verbose = FALSE), measure = "TTR")$TTR
        result[["Dickes.Steiwer"]] <- 235.95993 - (73.021 * C / W) - (12.56438 * W / St) - (50.03293 * TTR)
    }

    if ("DRP" %in% measure) {
        Bormuth.MC.Temp <- 0.886593 - (0.08364 * C / W) +
                 0.161911 * (W_wl.Dale.Chall / W) ^ 3 -
                 0.21401 * (W / St) +
                 0.000577 * (W / St) ^ 2 - 0.000005 * (W / St) ^ 3
        result[["DRP"]] <- (1 - Bormuth.MC.Temp) * 100
    }

    if ("ELF" %in% measure)
        result[["ELF"]] <- W2Sy / St

    if ("Farr.Jenkins.Paterson" %in% measure)
        result[["Farr.Jenkins.Paterson"]] <- -31.517 - 1.015 * W / St + 1.599 * W_1Sy / W * 100

    if ("Flesch" %in% measure)
        result[["Flesch"]] <- 206.835 - 1.015 * W / St - 84.6 * Sy / W

    if ("Flesch.PSK" %in% measure)
        result[["Flesch.PSK"]] <- 0.0778 * W / St + 4.55 * Sy / W - 2.2029

    if ("Flesch.Kincaid" %in% measure)
        result[["Flesch.Kincaid"]] <- 0.39 * W / St + 11.8 * Sy / W - 15.59

    if ("meanSentenceLength" %in% measure)
        result[["meanSentenceLength"]] <- W / St

    if ("meanWordSyllables" %in% measure)
        result[["meanWordSyllables"]] <- Sy / W

    if ("FOG" %in% measure)
        result[["FOG"]] <- 0.4 * (W / St + 100 * W3Sy / W)
    # If the text was POS-tagged accordingly, proper nouns and combinations of only easy words
    # will not be counted as hard words, and the syllables of verbs ending in "-ed", "-es" or
    # "-ing" will be counted without these suffixes.

    if ("FOG.PSK" %in% measure)
        result[["FOG.PSK"]] <- 3.0680 * ( 0.0877 * W / St ) + (0.0984 * 100 * W3Sy / W )

    if ("FOG.NRI" %in% measure)
        result[["FOG.NRI"]] <- ((( Wlt3Sy + 3 * W3Sy ) / (100 * St / W)) - 3) / 2

    if ("FORCAST" %in% measure)
        result[["FORCAST"]] <- 20 - (W_1Sy * 150 / W) / 10

    if ("FORCAST.RGL" %in% measure)
        result[["FORCAST.RGL"]] <- 20.43 - 0.11 * W_1Sy * 150 / W

    if ("Fucks" %in% measure)
        result[["Fucks"]] <- C / W * W / St

    if ("Linsear.Write" %in% measure)
        result[["Linsear.Write"]] <- ((100 - (100 * Wlt3Sy) / W) + (3 * 100 * W3Sy / W)) / (100 * St / W)

    if ("LIW" %in% measure)
        result[["LIW"]] <- (W / St) + (100 * W7C) / W

    if ("nWS" %in% measure)
        result[["nWS"]] <- 19.35 * W3Sy / W + 0.1672 * W / St + 12.97 * W6C / W - 3.27 * W_1Sy / W - 0.875

    if ("nWS.2" %in% measure)
        result[["nWS.2"]] <- 20.07 * W3Sy / W + 0.1682 * W / St + 13.73 * W6C / W - 2.779

    if ("nWS.3" %in% measure)
        result[["nWS.3"]] <- 29.63 * W3Sy / W + 0.1905 * W / St - 1.1144

    if ("nWS.4" %in% measure)
        result[["nWS.4"]] <- 27.44 * W3Sy / W + 0.2656 * W / St - 1.693

    if ("RIX" %in% measure)
        result[["RIX"]] <- W7C / St

    if ("SMOG" %in% measure)
        result[["SMOG"]] <- 1.043 * sqrt(W3Sy * 30 / St) + 3.1291

    if ("SMOG.C" %in% measure)
        result[["SMOG.C"]] <- 0.9986 * sqrt(W3Sy * 30 / St + 5) + 2.8795

    if ("SMOG.simple" %in% measure)
        result[["SMOG.simple"]] <- sqrt(W3Sy * 30 / St) + 3

    if ("SMOG.de" %in% measure)
        result[["SMOG.de"]] <- sqrt(W3Sy * 30 / St) - 2

    if (any(c("Spache", "Spache.old") %in% measure)) {
        # number of words which are not in the Spache word list
        W_wl.Spache <- lengths(tokens_remove(toks,
                                             pattern = quanteda.textstats::data_char_wordlists$spache,
                                             valuetype = "fixed",
                                             case_insensitive = TRUE))
    }

    if ("Spache" %in% measure)
        result[["Spache"]] <- 0.121 * W / St + 0.082 * (100 * W_wl.Spache / W) + 0.659

    if ("Spache.old" %in% measure)
        result[["Spache.old"]] <- 0.141 * W / St + 0.086 * (100 * W_wl.Spache / W) + 0.839

    if ("Strain" %in% measure)
        result[["Strain"]] <- Sy * 1 / (St / 3) / 10

    if ("Traenkle.Bailer" %in% measure) {
        Wprep <- vapply(toks, function(x) sum(x %in% prepositions), numeric(1))  # English prepositions
        Wconj <- vapply(toks, function(x) sum(x %in% conjunctions), numeric(1))  # English conjunctions
        result[["Traenkle.Bailer"]] <- 224.6814 - (79.8304 * C / W) - (12.24032 * W / St) - (1.292857 * 100 * Wprep / W)
    }

    if ("Traenkle.Bailer.2" %in% measure) {
        Wprep <- vapply(toks, function(x) sum(x %in% prepositions), numeric(1))  # English prepositions
        Wconj <- vapply(toks, function(x) sum(x %in% conjunctions), numeric(1))  # English conjunctions
        result[["Traenkle.Bailer.2"]] <- 234.1063 - (96.11069 * C / W) - (2.05444 * 100 * Wprep / W) - (1.02805 * 100 * Wconj / W)
    }

    #     if ("TRI" %in% measure) {
    #         Ptn <- lengths(tokens(x, remove_punct = FALSE)) - lengths(toks)
    #         Frg <- NA  # foreign words -- cannot compute without a dictionary
    #         result[["TRI <- (0.449 * W_1Sy) - (2.467 * Ptn) - (0.937 * Frg) - 14.417]
    #     }

    if ("Wheeler.Smith" %in% measure)
        result[["Wheeler.Smith"]] <- W / St * (10 * W2Sy) / W

    if ("Scrabble" %in% measure)
        result[["Scrabble"]] <- nscrabble(x, mean)


    result <- result[, c("document", measure)]

    # if intermediate is desired, add intermediate quantities to output
    if (intermediate) {
        result <- cbind(result,
                        data.frame(W, St, C, Sy, W3Sy, W2Sy, W_1Sy, W6C, W7C, Wlt3Sy))
        if (exists("W_wl.Dale.Chall")) result <- cbind(result, W_wl.Dale.Chall)
        if (exists("W_wl.Spache")) result <- cbind(result, W_wl.Spache)
    }

    # make any NA or NaN into NA (for #1976)
    result[is.na(result)] <- NA

    class(result) <- c("readability", "textstat", "data.frame")
    rownames(result) <- NULL # as.character(seq_len(nrow(result)))
    return(result)
}


#' @noRd
#' @export
textstat_readability.character <- function(x,
                                           measure = "Flesch",
                                           remove_hyphens = TRUE,
                                           min_sentence_length = 1,
                                           max_sentence_length = 10000, ...) {

    textstat_readability(corpus(x), measure, remove_hyphens,
                         min_sentence_length, max_sentence_length, ...)
}

conjunctions <- c("for", "and", "nor", "but", "or", "yet", "so")
prepositions <- c("a", "abaft", "abeam", "aboard", "about", "above", "absent",
                  "across", "afore", "after", "against", "along", "alongside",
                  "amid", "amidst", "among", "amongst", "an", "anenst", "apropos",
                  "apud", "around", "as", "aside", "astride", "at", "athwart", "atop",
                  "barring", "before", "behind", "below", "beneath", "beside", "besides",
                  "between", "beyond", "but", "by", "chez", "circa", "ca", "c",
                  "concerning", "despite", "down", "during", "except",
                  "excluding", "failing", "following", "for", "forenenst", "from",
                  "given", "in", "including", "inside", "into",
                  "like", "mid", "midst", "minus", "modulo", "near", "next",
                  "notwithstanding", "o'", "of", "off", "on", "onto",
                  "opposite", "out", "outside", "over", "pace", "past", "per", "plus",
                  "pro", "qua", "regarding", "round", "sans", "save", "since", "than",
                  "through", "thru", "throughout", "thruout", "times", "to", "toward",
                  "towards", "under", "underneath", "unlike", "until", "unto", "up",
                  "upon", "versus", "vs", "v", "via", "vis-a-vis", "with", "within",
                  "without", "worth")

#' Word lists for readability statistics
#'
#' `data_char_wordlists` provides word lists used in some readability indexes;
#' it is a named list of character vectors where each list element
#' corresponds to a different readability index.
#'
#' @format
#' A list of length two:
#' \describe{
#' \item{`DaleChall`}{The long Dale-Chall list of 3,000 familiar (English)
#' words needed to compute the Dale-Chall Readability Formula.}
#' \item{`Spache`}{The revised Spache word list (see Klare 1975, 73; Spache
#' 1974) needed to compute the Spache Revised Formula of readability (Spache
#' 1953).}
#' }
#' @references
#' Chall, J.S., & Dale, E. (1995). *Readability Revisited: The New
#' Dale-Chall Readability Formula*. Brookline Books.
#'
#' Dale, E. & Chall, J.S. (1948). A Formula for Predicting
#' Readability. *Educational Research Bulletin*, 27(1): 11--20.
#'
#' Dale, E. & Chall, J.S. (1948). A Formula for Predicting Readability:
#' Instructions. *Educational Research Bulletin*, 27(2): 37--54.
#'
#' Klare, G.R. (1975). Assessing Readability. *Reading Research Quarterly*
#' 10(1), 62--102.
#'
#' Spache, G. (1953). A New Readability Formula for Primary-Grade Reading
#' Materials. *The Elementary School Journal*, 53, 410--413.
#'
#' Spache, G. (1974).  _Good reading for poor readers_. (Rvd. 9th Ed.)
#' Champaign, Illinois: Garrard, 1974.
"data_char_wordlists"

Try the quanteda.textstats package in your browser

Any scripts or data that you put into this service are public.

quanteda.textstats documentation built on Nov. 2, 2023, 5:07 p.m.