detectRareWords: Looking up word frequencies
In userfriendlyscience: Quantitative Analysis Made Accessible

Description Usage Arguments Value Author(s) Examples

This function checks, for each word in a text, how frequently it occurs in a given language. This is useful for eliminating rare words to make a text more accessible to an audience with limited vocabulary. htmlParse and xpathSApply from the XML package are used to process HTML files, if necessary. textToWords is a helper function that simply breaks down a character vector to a vector of words.

detectRareWords(textFile = NULL,
                wordFrequencyFile = "Dutch",
                output = c("file", "show", "return"),
                outputFile = NULL,
                wordCol = "Word", freqCol = "FREQlemma",
                textToWordsFunction = "textToWords",
                encoding = "ASCII",
                xPathSelector = "/text()",
                silent = FALSE)
textToWords(characterVector)

`textFile`	If NULL, a dialog will be shown that enables users to select a file. If not NULL, this has to be either a filename or a character vector. An HTML file can be provided; this will be parsed using
`wordFrequencyFile`	The file with word frequencies to use. If 'Dutch' or 'Polish', files from the Center for Reading Research (http://crr.ugent.be/) are downloaded.
`output`	How to provide the output, as a character vector. If `file`, the filename to write to should be provided in `outputFile`. If `show`, the output is shown; and if `return`, the output is returned invisibly.
`outputFile`	The name of the file to store the output in.
`wordCol`	The name of the column in the `wordFrequencyFile` that contains the words.
`freqCol`	The name of the column in the `wordFrequencyFile` that contains the frequency with which each word occurs.
`textToWordsFunction`	The function to use to split a character vector, where each element contains one or more words, into a vector where each element is a word.
`encoding`	The encoding used to read and write files.
`xPathSelector`	If the file provided is an HTML file, `xpathSApply` is used to extract the content. `xPathSelector` specifies which content to extract (the default value extracts all text content).
`silent`	Whether to suppress detailed feedback about the process.
`characterVector`	A character vector, the elements of which are to be broken down into words.

detectRareWords return a dataframe (invisibly) if output contains return. Otherwise, NULL is returned (invisibly), but the output is printed and/or written to a file depending on the value of output.

textToWords returns a vector of words.

Gjalt-Jorn Peters

Maintainer: Gjalt-Jorn Peters <gjalt-jorn@userfriendlyscience.com>

## Not run: 
detectRareWords(paste('Dit is een tekst om de',
                      'werking van de detectRareWords',
                      'functie te demonstreren.'),
                output='show');

## End(Not run)