ipynbcount: Count text elements in Jupyter Notebook files

View source: R/ipynbcount.R

ipynbcountR Documentation

Count text elements in Jupyter Notebook files

Description

This function extracts text from specific cell types (e.g., markdown) in one or more .ipynb files and counts the number of characters, words, and lines. It optionally excludes certain patterns (e.g., code fences). The function uses a helper function rmdcount() to perform the counting on the extracted text.

Usage

ipynbcount(
  files,
  celltype = c("markdown"),
  space = "[[:space:]]",
  word = "[[:space:]]+",
  line = "\n",
  exclude = "```\\{.*?```"
)

Arguments

files

character: vector of paths to .ipynb (Jupyter Notebook) files.

celltype

character: vector indicating which cell types to include (default is 'markdown'). Valid values include 'markdown' and 'code'.

space

character: pattern to split a text at spaces (default: '[[:space:]]')

word

character: pattern to split a text at word boundaries (default: '[[:space:]]+')

line

character: pattern to split lines (default: '\n')

exclude

character: pattern to exclude text parts, e.g. code chunks (default: '```\\{.*?```')

Details

This function assumes that the notebook files are valid JSON and contain a list of cells under the cells field. It temporarily writes the extracted content to a file to reuse the rmdcount() logic.

Value

A data frame with counts of characters, words, and lines for each file. Additional columns include file (base name) and path (directory).

Examples

file <- system.file('ipynb/example_data_analysis.ipynb', package="rmdwc")
ipynbcount(file)                                   # without code
ipynbcount(file, celltype=c("markdown", "code"))   # with code


rmdwc documentation built on June 8, 2025, 10:44 a.m.