autocodebook: Automatic Codebook and Tracking for 'Spark' and 'dplyr' Pipelines

Wraps 'dplyr' verbs (mutate, summarise, filter) to automatically capture variable metadata (type, source columns, categories, and source code), producing a codebook and eligibility tracking table with zero manual documentation. Works with both 'sparklyr' (tbl_spark) and local data frames. Adds big-data optimizations (caching, assume-unique counting, checkpointing) and a standardized report module with an eligibility flowchart, editable codebook export (HTML, DOCX, XLSX), and cross-sectional or longitudinal variable inspection. The eligibility flowchart follows the CONSORT statement (Schulz, Altman and Moher (2010) <doi:10.1136/bmj.c332>) and the reporting of observational cohort studies follows the STROBE recommendations (von Elm and others (2007) <doi:10.1371/journal.pmed.0040296>).

Package details

AuthorPatricia Fortes C. de Macedo [aut, cre]
MaintainerPatricia Fortes C. de Macedo <macedopatriciafortes@gmail.com>
LicenseMIT + file LICENSE
Version0.1.0
URL https://github.com/patriciafortesm/autocodebook
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("autocodebook")

Try the autocodebook package in your browser

Any scripts or data that you put into this service are public.

autocodebook documentation built on June 9, 2026, 1:09 a.m.