count_words_in_rmd_file: Count words in an Rmarkdown file
In tjmahr/tjmisc: TJ's Miscellany

count_words_in_rmd_file

R Documentation

Count words in an Rmarkdown file

Description

These functions strips away code and non-prose elements before counting words.

Usage

count_words_in_rmd_file(path)

count_words_in_rmd_lines(lines)

simplify_rmd_lines(lines)

Arguments

`path`	path to an Rmarkdown file
`lines`	a character vector of text (from an Rmarkdown file)

Details

The helper function simplify_rmd_lines() strips down an Rmarkdown file so that dubious things do not contribute to the word count. It does the following.

Remove all lines that fall between a pair of ```` lines. (These are used sometimes to show verbatim text from blocks with three tick marks).
Remove all lines that fall between a pair of ``` lines.
Lines that end with `r are merged with the following line.
Inline code spans are replaced with a single word (`code`).
Single-line HTML comments are deleted.

These steps are very ad hoc, updated and expanded as I run into new things that need to be excluded from my word counts. Let's not pretend that this thing is at all comprehensive.

The word-count is computed by stringi::stri_stats_latex().

Value

a data-frame with the counts of word, characters in words, and whitespace characters. simplify_rmd_lines() returns a character vector of simplified Rmarkdown lines.

tjmahr/tjmisc documentation built on Feb. 8, 2023, 12:21 p.m.