analyse_grid_search | R Documentation |
Takes as input a folder with multiple session data produced by
perform_grid_evaluation()
, each session representing a combination of
parameters and computes the best combinations in terms of a perfomance score.
analyse_grid_search( session_folder = "Grid_Search", tot_pos = NULL, tot_records = NULL, plot = TRUE, score = c("Sens_adj_eff", "Pos_rate", "Pos_rate_adj_sens") )
session_folder |
Where to find the result sessions produced by
|
tot_pos |
Total number of positive matches among records. If |
tot_records |
Total number of records. If |
plot |
Whether to plot the marginal impact of each parameter. |
score |
Which one of the scores to use to measure classification
performance. Can be |
For each session a performance score is computed. The default is sensitivity x efficiency, with efficiency being the one minus the ratio of reviewed records over the total. The statistics are computed on a subset of fully labeled records.
The analysis is performed on each session's "Results" files.
A partition tree algorithm is used to group parameter combinations by average scores, identifying "performance clusters". The combination with the best sensitivity followed by the best efficiency is then shown for each cluster. Optionally, a plot can be generated with the marginal impact of each parameter.
A list with:
iterations |
A data frame describing each classification/review iteration for each session, reporting the parameter values used and the performance score. |
best_parms |
The highest performance parameter set in the best parameter cluster. First, the parameter clusters are ordered by average performance score, then parameter combinations inside the cluster are ordered by sensitivity followed by efficiency. A data frame with the score, the ratio of positive matches over the total records, the ratio of reviewed records, the sensitivity and the efficiency, is reported. |
best_by_rule |
A data frame with performance
info (like for |
plot |
A plot with the marginal impact of each
parameter if the |
## Not run: out <- analyse_grid_search() # check the best parameter set (use str() for easy to read output) str(out$best_parms) # check the best set for each performance cluster View(out$best_by_rule) # plot parameter marginal impact is the plot argument is TRUE out$plot ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.