View source: R/Rule_building.R
extract_rules | R Documentation |
Starting from a Document Term Matrix (DTM) and a posterior predictive distribution (PPD) matrix produced by the Bayesian classification engine, a decision tree algorithm is used to extract rules that partition a subset of draws from the PPD. Beware that the generation of the rules may take a long time.
extract_rules( session_name, rebuild_dtm = FALSE, vimp.threshold = 1.25, n.trees = 800, sessions_folder = getOption("baysren.sessions_folder", "Sessions"), save_path = file.path(sessions_folder, session_name, "rule_data.rds"), ... )
session_name |
A session identifier corresponding to folders into the
|
rebuild_dtm |
Whether to use the last DTM stored in the
|
vimp.threshold |
A threshold in the standardized variable importance score to filter out less relevant terms in the DTM. |
n.trees |
How many draws to use from the PPD matrix to build decision trees. This parameter strongly impacts computational time but increases sensitivity of the rules found. |
sessions_folder |
Where to find the |
save_path |
Since generating the rules is a computation intense process
it's advisable to save the output in a .rds file placed inside the
|
... |
Additional arguments passed to |
The algorithm allows to use only a subset of the terms in the DTM and of the samples in the PPD matrix to cut on computation time. In the first case, a threshold is used to filter only the most relevant features in the DTM. Before being used, terms in the DTM are aggregated if they appear in multiple fields of the citation records and only their general presence in the record will be stored.
A list with:
SpecificDTM |
The DTM with the less relevant terms being filtered out and terms in multiple record fields being aggregated. |
DTM |
The full DTM with the predicted classification. |
rules |
A data frame reporting the selected rules with the average PPD. |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.