analyse_grid_search: Analyse the results of a parameter grid search

analyse_grid_searchR Documentation

Analyse the results of a parameter grid search

Description

Takes as input a folder with multiple session data produced by perform_grid_evaluation(), each session representing a combination of parameters and computes the best combinations in terms of a perfomance score.

Usage

analyse_grid_search(
  session_folder = "Grid_Search",
  tot_pos = NULL,
  tot_records = NULL,
  plot = TRUE,
  score = c("Sens_adj_eff", "Pos_rate", "Pos_rate_adj_sens")
)

Arguments

session_folder

Where to find the result sessions produced by perform_grid_evaluation().

tot_pos

Total number of positive matches among records. If NULL it will be inferred by the Annotation files in session_folder which then need to be fully labelled.

tot_records

Total number of records. If NULL it will be inferred by the Annotation files in session_folder which then need to be fully labelled.

plot

Whether to plot the marginal impact of each parameter.

score

Which one of the scores to use to measure classification performance. Can be Sens_adj_eff: sensitivity by efficacy, Pos_rate: the ratio of positive labels found over the total record (positive rate), Pos_rate_adj_sens: sensitivity by positive rate.

Details

For each session a performance score is computed. The default is sensitivity x efficiency, with efficiency being the one minus the ratio of reviewed records over the total. The statistics are computed on a subset of fully labeled records.

The analysis is performed on each session's "Results" files.

A partition tree algorithm is used to group parameter combinations by average scores, identifying "performance clusters". The combination with the best sensitivity followed by the best efficiency is then shown for each cluster. Optionally, a plot can be generated with the marginal impact of each parameter.

Value

A list with:

iterations

A data frame describing each classification/review iteration for each session, reporting the parameter values used and the performance score.

best_parms

The highest performance parameter set in the best parameter cluster. First, the parameter clusters are ordered by average performance score, then parameter combinations inside the cluster are ordered by sensitivity followed by efficiency. A data frame with the score, the ratio of positive matches over the total records, the ratio of reviewed records, the sensitivity and the efficiency, is reported.

best_by_rule

A data frame with performance info (like for best_parms) for the best parameter set for each performance cluster.

plot

A plot with the marginal impact of each parameter if the plot argument is TRUE, otherwise NULL.

Examples

## Not run: 
out <- analyse_grid_search()

# check the best parameter set (use str() for easy to read output)
str(out$best_parms)

# check the best set for each performance cluster
View(out$best_by_rule)

# plot parameter marginal impact is the plot argument is TRUE
out$plot

## End(Not run)


bakaburg1/BaySREn documentation built on March 30, 2022, 12:16 a.m.