report_n: Report number of distinct value in a column across data...

View source: R/report_n.R

report_nR Documentation

Report number of distinct value in a column across data frames


This function is intended to mimic dplyr::n_distinct() for multiple inputs. It is useful to report the number of clients through out a series of inclusion or exclusion steps. An use case could be getting the Ns for the sample definition flowchart in an epidemiological study. It is also useful for inline reporting of Ns in a Rmarkdown document.


report_n(..., on, force_proceed = getOption("healthdb.force_proceed"))



Data frames or remote tables (e.g., from 'dbplyr')


The column to report on. It must be present in all data sources.


A logical for whether to ask for user input in order to proceed when the data is not local data.frames, and a query needs to be executed before reporting. The default is fetching from options (FALSE). Use options(healthdb.force_proceed = TRUE) to suppress the prompt once and for all.


A sequence of the number of distinct on for each data frames


# some exclusions
iris_1 <- subset(iris, Petal.Length > 1)
iris_2 <- subset(iris, Petal.Length > 2)

# get n at each operation
n <- report_n(iris, iris_1, iris_2, on = Species)

# get the difference at each step
# data in a list
iris_list <- list(iris_1, iris_2)
report_n(rlang::splice(iris_list), on = Species)
# if you loaded tidyverse, this will also work
# report_n(!!!iris_list, on = Species)

healthdb documentation built on May 29, 2024, 8:57 a.m.