perform_grid_evaluation: Perform a grid evaluation of parameters to tune the...
In bakaburg1/BaySREn: BaySREn. An R package to automatise citation collection and screening in Systematic Reviews. Based on Bayesian active machine learning

perform_grid_evaluation

R Documentation

Perform a grid evaluation of parameters to tune the classification framework

Description

The performance of the framework, measured as Sensitivity (rate of relevant record found over all relevant records) and Efficiency (one minus the ratio of manually reviewed records) is strongly impacted by a number of parameters.

Usage

perform_grid_evaluation(
  records,
  sessions_folder = "Grid_Search",
  prev_classification = records,
  resample = c(FALSE, TRUE),
  n_init = c(50, 100, 250, 500),
  n_models = c(1, 5, 10, 20, 40, 60),
  pos_mult = c(1, 10, 20),
  pred_quants = list(c(0.1, 0.5, 0.9), c(0.05, 0.5, 0.95), c(0.01, 0.5, 0.99)),
  limits = list(stop_after = 4, pos_target = NULL, labeling_limit = NULL)
)

Arguments

`records`	A fully labelled Annotation data set (data frame or a path to a Excel / CSV file).
`sessions_folder`	A path to a folder where to store the grid search results.
`prev_classification`	An Annotation data set or file with labelled records. The labels in this data set will be used as ground truth for the `records` file, but the records themselves will not be used.
`n_init`	A vector of numbers enumerating the size of the initial training set. The initial training set simulates the initial manual labelling of records used to train the model. It is generated by the `records` data set selecting records in descending order.
`pos_mult, n_models, resample, pred_quants`	A vector of values for each parameter. For `pred_quants` a list of vectors. See `enrich_annotation_file()` for more details.
`limits`	The conditions on which a CR cycle is stopped. See `enrich_annotation_file()`.

Details

These parameters are related to the framework only and independent by the specific Bayesian classification method used (which itself has other specific parameters). The parameters are the following:

n_init: The number of records in the manually labeled initial training set.
n_models: The number of models trained and then averaged to stabilize the posterior predictive distribution (PPD).
resample: Whether to bootstrap the data between model retraining if the number of models is more than one.
pos_mult: Oversampling rate of the positive labeled records.
pred_quants: The quantiles used to summarise the records' PPD and built the Uncertainty zone.

Check enrich_annotation_file() for more insight about their influence on the framework and the classification results. Since all records are pre-labelled, the manual review phase is performed automatically.

The algorithm starts from a fully labelled Annotation set and performs a Classification/Review cycle for each combination of parameters.

A great number of files will be created (40 GB with the default grid parameters for a input records file with 1200 labelled records), one session folder for each parameter combination. Therefore, be sure to have enough disk space before starting. Also, keep in mind that a full search may requires many days, even on powerful computers.

Value

A message with the number of parameter combinations evaluated.

Examples

## Not run: 

# First, the user needs to manually label a significant number of records; we
# suggest one thousand or more. The new record file can be stored anywhere,
# but putting it into the grid search folder is a better practice.

records <- file.path("Grid_Search", "Classification_data.xlsx")

Grid_search <- perform_grid_evaluation(
  records,
  sessions_folder = "Grid_Search",
  prev_classification = records,
  ## Model parameters (can be changed by users)
  resample = c(FALSE, TRUE),
  n_init = c(50, 100, 250, 500),
  n_models = c(1, 5, 10, 20, 40, 60),
  pos_mult = c(1, 10, 20),
  pred_quants = list(
    c(.1, .5, .9),
    c(.05, .5, .95),
    c(.01, .5, .99)
  )
)

## End(Not run)

bakaburg1/BaySREn documentation built on March 30, 2022, 12:16 a.m.

bakaburg1/BaySREn index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bakaburg1/BaySREn
BaySREn. An R package to automatise citation collection and screening in Systematic Reviews. Based on Bayesian active machine learning

perform_grid_evaluation: Perform a grid evaluation of parameters to tune the...
In bakaburg1/BaySREn: BaySREn. An R package to automatise citation collection and screening in Systematic Reviews. Based on Bayesian active machine learning

Perform a grid evaluation of parameters to tune the classification framework

Description

Usage

Arguments

Details

Value

Examples

Related to perform_grid_evaluation in bakaburg1/BaySREn...

R Package Documentation

Browse R Packages

We want your feedback!

bakaburg1/BaySREn BaySREn. An R package to automatise citation collection and screening in Systematic Reviews. Based on Bayesian active machine learning

perform_grid_evaluation: Perform a grid evaluation of parameters to tune the... In bakaburg1/BaySREn: BaySREn. An R package to automatise citation collection and screening in Systematic Reviews. Based on Bayesian active machine learning

Perform a grid evaluation of parameters to tune the classification framework

Description

Usage

Arguments

Details

Value

Examples

Related to perform_grid_evaluation in bakaburg1/BaySREn...

R Package Documentation

Browse R Packages

We want your feedback!

bakaburg1/BaySREn
BaySREn. An R package to automatise citation collection and screening in Systematic Reviews. Based on Bayesian active machine learning

perform_grid_evaluation: Perform a grid evaluation of parameters to tune the...
In bakaburg1/BaySREn: BaySREn. An R package to automatise citation collection and screening in Systematic Reviews. Based on Bayesian active machine learning