generate_rule_selection_set: Generate a rule selection set for user review

View source: R/Rule_building.R

generate_rule_selection_setR Documentation

Generate a rule selection set for user review

Description

The rules extracted by extract_rules() rules are grouped by similar sensitivity and presented to the user who will need to select a subset of them (ideally one per sensitivity group).

Usage

generate_rule_selection_set(
  rules,
  target_vec,
  target_data,
  add_negative_terms = TRUE,
  save_path = NULL
)

Arguments

rules

A vector of rules as produced by extract_rules().

target_vec

A vector of labels.

target_data

A DTM with a number of rows as the elements in rules.

add_negative_terms

Whether to increase specificity by adding negative terms to the rules. Adding the negative terms in computationally heavy.

save_path

Where to save the Excel file in which users will need to select the final rules. A best practice is to save the output in the session folder used to generate the rules and call it "Selected_rules.xlsx".

Details

Only rules with at least one positive component are selected and (optionally) negative terms are added to them from the Document Term Matrix (DTM) to increase specificity. Rules are then arranged by positive - negative records matched and grouped by the cumulative number of records identified.

The first column of the output shows which rule is the suggested one for each performance group. The user can edit this column, adding/removing rules by setting them TRUE or FALSE. Once the rules are reviewed, it's advisable to change the file name, e.g., "Selected_rules.xlsx".

Value

A data frame with groups of rules characterized by equal cumulative sensitivity. The first column marks the suggested rule for each group. The data frame gets saved to the file in save_path if not NULL.

Examples

## Not run: 
candidate_queries <- readRDS(file.path("Sessions", "Session1", "rule_data.rds"))

Target <- candidate_queries$DTM$Target
SpecificDTM <- candidate_queries$SpecificDTM

selection_set_file <- file.path("Sessions", "Session1", "Selected_rules_reviewed.xlsx")

selection_set <- generate_rule_selection_set(
  candidate_queries$rule,
  target_vec = Target,
  target_data = SpecificDTM,
  save_path = selection_set_file
)

## End(Not run)

bakaburg1/BaySREn documentation built on March 30, 2022, 12:16 a.m.