generate_report: Generate a standardized report from the current session

View source: R/09_report_generate.R

generate_reportR Documentation

Generate a standardized report from the current session

Description

Produces an HTML report combining the eligibility flowchart, the codebook, and a per-variable inspection panel. Supports two inspection modes:

Usage

generate_report(
  data,
  type = c("cross_sectional", "longitudinal"),
  id_var = NULL,
  time_var = NULL,
  variables = NULL,
  labels = NULL,
  treat_as_categorical = NULL,
  output_html,
  output_dir = NULL,
  export_codebook_editable = TRUE,
  cache_data = TRUE,
  title = NULL,
  n_bins = 30,
  top_n_cat = 20
)

Arguments

data

A Spark DataFrame (tbl_spark) or local data frame.

type

One of "cross_sectional" or "longitudinal".

id_var

Character. Name of the ID column. For longitudinal, mandatory. For cross_sectional, used to skip the ID column in inspection.

time_var

Character or NULL. Name of the time/wave column. Used in longitudinal to compute missingness-over-time. Default: NULL.

variables

Optional character vector. If provided, inspects only these variables. Default: NULL (all except id_var/time_var).

labels

Optional named list (variable -> label). If NULL, uses labels from the codebook when available.

treat_as_categorical

Character vector of variable names to treat as categorical even when their R class is numeric or integer. Useful for coded variables (e.g. cod_sexo stored as 1L/2L, cod_raca stored as integer). For these variables, the report uses bar charts and proportion-by-time stacked plots instead of histograms / median+IQR. Default: NULL.

output_html

File path for the HTML output. There is no default: the destination must be supplied explicitly (e.g. a file under tempdir() or a directory chosen by the user).

output_dir

Optional directory for ancillary files (codebook.xlsx, codebook.docx, etc.). If NULL, derived from output_html.

export_codebook_editable

Logical. Also export codebook as .docx and .xlsx in output_dir. Default: TRUE.

cache_data

Logical. If TRUE and data is a tbl_spark, persists the dataset once before the report aggregations, then releases it on exit. No-op for local data frames. Default: TRUE.

title

Optional title for the report.

n_bins

Number of bins for numeric histograms. Default: 30.

top_n_cat

Max categories shown in categorical plots. Default: 20.

Details

  • cross_sectional: one plot per variable (histogram / bar / time).

  • longitudinal: three plots per variable (global distribution, intra-ID variation, missingness by time) plus a meta plot of observations per ID.

All aggregations happen in Spark/dplyr; only small summaries are collected.

Value

Invisible list with paths to all generated files.

Examples


# Rendering the HTML report needs rmarkdown + pandoc and a few plotting
# packages (all in Suggests); it also takes more than 5 seconds, so the
# example is wrapped in \donttest and writes only to tempdir().
if (requireNamespace("rmarkdown", quietly = TRUE) &&
    requireNamespace("knitr", quietly = TRUE) &&
    requireNamespace("ggplot2", quietly = TRUE) &&
    requireNamespace("patchwork", quietly = TRUE) &&
    requireNamespace("scales", quietly = TRUE) &&
    rmarkdown::pandoc_available()) {

  cb_init(id_col = "id_indiv")
  df_baseline <- data.frame(
    id_indiv = sprintf("ID%03d", 1:50),
    cod_sexo = sample(c(1L, 2L), 50, replace = TRUE),
    idade    = sample(18:80, 50, replace = TRUE)
  )

  # Write to a dedicated subdir of tempdir() and clean everything up after:
  out_dir <- file.path(tempdir(), "autocodebook_report_demo")
  generate_report(df_baseline, type = "cross_sectional",
                  id_var = "id_indiv",
                  treat_as_categorical = "cod_sexo",
                  output_html = file.path(out_dir, "report_baseline.html"))
  unlink(out_dir, recursive = TRUE)
}


autocodebook documentation built on June 9, 2026, 1:09 a.m.