count_words_in_rmd_file: Count words in an Rmarkdown file

View source: R/rmarkdown.R

count_words_in_rmd_fileR Documentation

Count words in an Rmarkdown file

Description

These functions strips away code and non-prose elements before counting words.

Usage

count_words_in_rmd_file(path)

count_words_in_rmd_lines(lines)

simplify_rmd_lines(lines)

Arguments

path

path to an Rmarkdown file

lines

a character vector of text (from an Rmarkdown file)

Details

The helper function simplify_rmd_lines() strips down an Rmarkdown file so that dubious things do not contribute to the word count. It does the following.

  1. Remove all lines that fall between a pair of ```` lines. (These are used sometimes to show verbatim text from blocks with three tick marks).

  2. Remove all lines that fall between a pair of ``` lines.

  3. Lines that end with `r are merged with the following line.

  4. Inline code spans are replaced with a single word (`code`).

  5. Single-line HTML comments are deleted.

These steps are very ad hoc, updated and expanded as I run into new things that need to be excluded from my word counts. Let's not pretend that this thing is at all comprehensive.

The word-count is computed by stringi::stri_stats_latex().

Value

a data-frame with the counts of word, characters in words, and whitespace characters. simplify_rmd_lines() returns a character vector of simplified Rmarkdown lines.


tjmahr/tjmisc documentation built on Feb. 8, 2023, 12:21 p.m.