estimate_performance: Evaluate and report classification performance of given an...
In bakaburg1/BaySREn: BaySREn. An R package to automatise citation collection and screening in Systematic Reviews. Based on Bayesian active machine learning

estimate_performance

R Documentation

Evaluate and report classification performance of given an Annotation file

Description

This function estimate Sensitivity and Efficiency (the latter as "Work saved over random classification", WSoR) of the classification process (i.e., both the automatic classification and the human). A robust estimate of the total number of relevant (positive) records in the whole data set is produced to compute these statistics.

Usage

estimate_performance(
  records,
  model = NULL,
  preds = NULL,
  plot = TRUE,
  quants = getOption("baysren.probs", c(0.05, 0.5, 0.95)),
  nsamples = min(2500, sum(model$fit@sim$n_save)),
  seed = 23797297,
  save_preds = FALSE,
  save_model = FALSE
)

Arguments

`records`	An Annotation data set produced by `enrich_annotation_file()` or a file path to it.
`model`	A `brm` model built using `estimate_positivity_rate_model()`. Will be created from `records` if `NULL`.
`preds`	A matrix of posterior predictions as produced by `brms::posterior_predict()`. If passed they need to be derived by the same model in `model`.
`plot`	Whether to plot the cumulative number of positive matches plus the posterior predictive distribution as computed by `model`, truncated at the number of observed ones.
`quants`	Point estimate and boundaries of the posterior distributions to use in the results and in the plot.
`nsamples`	Number of samples to use to build the posterior distribution, lower bounded at the number used to fit the `model`.
`seed`	A seed to reproduce the results.
`save_preds`	Whether to save the posterior prediction matrix. Can be passed to `preds`.
`save_model`	Whether to save the model. Can be passed to `model`

Details

For this purpose, estimate_positivity_rate_model() is employed, which uses a Bayesian logistic model to estimate the probability of a relevant record given the lower boundaries of the PPD produced by the classification model for the records whose label was manually reviewed. This model does not take into account records' other characteristics, providing a simple, maximum uncertainty model.

The model is used to predict the distribution of the number of missed relevant matches among the unreviewed records. This number is then used to compute the expected Sensitivity (i.e., the ratio of observed positive matches and the theoretical ones) and Efficiency (i.e. ratio of the number of reviewed records and the number of records needed to review at random to find the same amount of relevant matches, according to the hypergeometric distribution).

Finally, several summary statistics are reported, describing the observed results of the classification (i.e., number of reviewed records, number of positives found) and the statistics computed using the surrogate logistic model (i.e., Sensitivity, Efficiency and the R^2 of the surrogate model), including their uncertainty intervals.

Optionally, a plot showing the observed cumulative number of positive matches plus its posterior predictive distribution according to the surrogate model.

Value

A data frame with the following columns:

`obs_positives`	the observed number of positive matches;
`pred_positives`	the quantiles of the predicted distribution of the number of positive matches.
`mod_r2`	the surrogate model fit (R^2).
`n_reviewed`	the number of records reviewed.
`total_records`	the total records in the Annotation file;
`used_prop`	the posterior distribution of the proportion of reviewed record over the amount needed with random classification (1 - WSoR).
`efficiency`	the posterior distribution of one minus the proportion of reviewed record over the amount needed with random classification (WSoR).
`Sensitivity`	the posterior distribution of the Sensitivity computed over the predicted number of positives according to the surrogate model.

Examples

## Not run: 

annotation_file <- get_session_files("Session1")$Annotations %>% last()

analysis <- estimate_performance(annotation_file)

## End(Not run)

bakaburg1/BaySREn documentation built on March 30, 2022, 12:16 a.m.

bakaburg1/BaySREn index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bakaburg1/BaySREn
BaySREn. An R package to automatise citation collection and screening in Systematic Reviews. Based on Bayesian active machine learning

estimate_performance: Evaluate and report classification performance of given an...
In bakaburg1/BaySREn: BaySREn. An R package to automatise citation collection and screening in Systematic Reviews. Based on Bayesian active machine learning

Evaluate and report classification performance of given an Annotation file

Description

Usage

Arguments

Details

Value

Examples

Related to estimate_performance in bakaburg1/BaySREn...

R Package Documentation

Browse R Packages

We want your feedback!

bakaburg1/BaySREn BaySREn. An R package to automatise citation collection and screening in Systematic Reviews. Based on Bayesian active machine learning

estimate_performance: Evaluate and report classification performance of given an... In bakaburg1/BaySREn: BaySREn. An R package to automatise citation collection and screening in Systematic Reviews. Based on Bayesian active machine learning

Evaluate and report classification performance of given an Annotation file

Description

Usage

Arguments

Details

Value

Examples

Related to estimate_performance in bakaburg1/BaySREn...

R Package Documentation

Browse R Packages

We want your feedback!

bakaburg1/BaySREn
BaySREn. An R package to automatise citation collection and screening in Systematic Reviews. Based on Bayesian active machine learning

estimate_performance: Evaluate and report classification performance of given an...
In bakaburg1/BaySREn: BaySREn. An R package to automatise citation collection and screening in Systematic Reviews. Based on Bayesian active machine learning