README.md
In jmeydam/countcheck: Detecting Count Anomalies Using Bayesian Hierarchical Models

countcheck

Prototype (R package).

Dertermines upper control limits for count data
Reasonable results also in cases with few or no observed events
HTML reports

Model and code are based on examples in Gelman et al. (2014) and McElreath (2020).

This prototype is related to a previous simulation study.

The distributions and parameters in the simulation study were chosen so that the generated data is comparable to certain data of interest. The prototype is meant to be used for roughly the same kind of data. Please refer to this report for details.

The package documentation can be accessed via the package website.

Install the following R packages and their dependencies:

actuar
extraDistr
cmdstanr (instructions, also for installing CmdStan)

The prototype can then be installed from GitHub with:

devtools::install_github("https://github.com/jmeydam/countcheck.git", 
                         build_vignettes = FALSE)

We simulate count data for 1000 observational units, assuming a certain known data-generating process.

The data-generating process is free of anomalies, and all counts exceeding their respective upper control limits (UCLs) are false positives. Other things being equal it is desirable to have as few false positives as possible.

Using a Bayesian hierarchical model with partial pooling, 18 new counts exceed their respective upper control limits, vs. 68 when using a no-pooling model. (4 counts exceed the UCLs based on the true value of the parameter theta, which is known in this case.)

> library(countcheck)

> d <- countcheck(random_seed = 200807)

> # Counts exceeding UCLs
> # *********************
> # ucl_true_theta:
> sum(d$y_new - d$ucl_true_theta > 0)
[1] 4
> # ucl_nopool:
> sum(d$y_new - d$ucl_nopool > 0)
[1] 68
> # ucl_complpool:
> sum(d$y_new - d$ucl_complpool > 0)
[1] 74
> # ucl_partpool:
> sum(d$y_new - d$ucl_partpool > 0)
[1] 18

Using the dataset provided in the package, the same result (here without the true_theta values) can be obtained with:

> e <- simdat
> d1 <- countcheck(unit = e$unit,
+                  n = e$n,
+                  y = e$y,
+                  n_new = e$n_new,
+                  y_new = e$y_new,
+                  random_seed = 200807)

In a realistic scenario it is not possible to assess performance by comparison with "true" values. The true values of theta are generally not known, and the probability distributions assumed for the Bayesian hierarchical model (the no-pooling model) may in fact be inappropriate.

On the other hand, in a realistic scenario it is often possible to assess performance by investigating individual cases.

Select data from data frame d (example above) for the report:

> countcheck_df <- select_for_report(d)

We also need a data frame with unit master data. Here, we just generate some data:

> units_tmp <- sort(unique(countcheck_df$unit))
> unit_df <- data.frame(
+   unit = units_tmp,
+   unit_name = paste("Unit",
+                     units_tmp),
+   unit_url = paste0("http://domain/units/",
+                     units_tmp,
+                     ".html"),
+   unit_group_name = paste("Group",
+                           rep(1:5,
+                               length.out = length(units_tmp)))
+ )

Generate report as R string:

> report <- html_report(
+   countcheck_list = list(
+     list(df = countcheck_df, caption = "KPI 1")
+   ),
+   unit_df = unit_df,
+   title = "Report",
+   table_width_px = 500,
+   column_headers = c(
+     group = "Group",
+     count = "Count",
+     ucl = "UCL",
+     unit = "Unit",
+     name = "Name"),
+   charset = "utf-8",
+   lang = "en",
+   home_url = "https://github.com/jmeydam/countcheck"
+ )

Save HTML report to disk:

> sink("report.html")
> cat(report)
> sink()

View HTML report

jmeydam/countcheck documentation built on Aug. 16, 2024, 11:34 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jmeydam/countcheck
Detecting Count Anomalies Using Bayesian Hierarchical Models

README.md
In jmeydam/countcheck: Detecting Count Anomalies Using Bayesian Hierarchical Models

countcheck

Detecting Count Anomalies Using Bayesian Hierarchical Models

Installation

Example

Alternatively:

HTML Report

R Package Documentation

Browse R Packages

We want your feedback!

jmeydam/countcheck Detecting Count Anomalies Using Bayesian Hierarchical Models

README.md In jmeydam/countcheck: Detecting Count Anomalies Using Bayesian Hierarchical Models

countcheck

Detecting Count Anomalies Using Bayesian Hierarchical Models

Installation

Example

Alternatively:

HTML Report

R Package Documentation

Browse R Packages

We want your feedback!

jmeydam/countcheck
Detecting Count Anomalies Using Bayesian Hierarchical Models

README.md
In jmeydam/countcheck: Detecting Count Anomalies Using Bayesian Hierarchical Models