rmdcount: Word, character and non-whitespace characters count
In rmdwc: Count Words and Characters in R Markdown and Jupyter Notebooks

View source: R/rmdcount.R

rmdcount

R Documentation

Word, character and non-whitespace characters count

Description

rmdcount counts lines, words, bytes, characters and non-whitespace characters in R Markdown files excluding code chunks. txtcount counts lines, words, bytes, characters and non-whitespace characters in plain text files.
Note that the counts may differ a bit from unix wc and Libre Office because it depends on the definition of a line, a word and a character.

Usage

rmdcount(
  files = NULL,
  space = "[[:space:]]",
  word = "[[:space:]]+",
  line = "\n",
  exclude = "```\\{.*?```"
)

txtcount(
  files = NULL,
  space = "[[:space:]]",
  word = "[[:space:]]+",
  line = "\n"
)

Arguments

`files`	character: file name(s)
`space`	character: pattern to split a text at spaces (default: `'[[:space:]]'`)
`word`	character: pattern to split a text at word boundaries (default: `'[[:space:]]+'`)
`line`	character: pattern to split lines (default: `'\n'`)
`exclude`	character: pattern to exclude text parts, e.g. code chunks (default: '```\\{.*?```')

Details

We define:

Line: the number of lines. It differs from unix wc -l since wc counts the number of newlines.
Word: it is considered to be a character or characters delimited by white space. However, a "word" is in general a fuzzy concept, for example is "3.141593" a word? Therefore different programs may count differently, for more details see the discussion to the Libreoffice bug Word count gives wrong results - Another Example Comment 5.

The following approach is used to detect lines, words, characters and non-whitespace characters.

lines: strsplit(rmd, line)[[1]] with line='\n'
bytes: charToRaw(rmd)
words: strsplit(rmd, word)[[1]] with word='[[:space:]]+'
characters: strsplit(rmd, '')[[1]]
non-whitespace characters: strsplit(gsub(space, '', rmd), '')[[1]] with space='[[:space:]]'

If txtcount is used then code chunks are deleted with gsub('```\\{.*?```', '', rmd) before counting.

Value

a data frame with following elements

file: basename of file
lines: number of lines
words: number of words
bytes: number of bytes
chars: number of characters
nonws: number of non-whitespace characters
path: path of file

Examples

# count excluding code chunks
files <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc")
rmdcount(files)
# count including code chunks
txtcount(files) # or rmdcount(files, exclude='')
# count for a set of R Markdown docs
files <- list.files(path=system.file('rmarkdown', package="rmdwc"), 
                    pattern="*.Rmd", full.names=TRUE)
rmdcount(files)
# use of rmdcount() in a R Markdown document 
if (interactive()) {
  files <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc")
  file.edit(files) # SAVE(!) the file and knit it 
}
# count including code chunks
files <- system.file('rmarkdown/rstudio_pdf.Rmd', package="rmdwc")
txtcount(files)

rmdwc documentation built on June 8, 2025, 10:44 a.m.