extract_var_imp: Extract the importance of features in the Document Term...

View source: R/Analysis.R

extract_var_impR Documentation

Extract the importance of features in the Document Term Matrix

Description

Inside enrich_annotation_file(), the feature relevance for the classification model is estimated from the Document Term Matrix (DTM) and stored in the Annotation file. In the case of the default BART model, the feature importance is the rate of posterior trees in which a term was used, plus its Z score if an ensemble of models is used.

Usage

extract_var_imp(
  session_name,
  num_vars = 15,
  score_filter = 1.5,
  recompute_DTM = FALSE,
  sessions_folder = getOption("baysren.sessions_folder", "Sessions")
)

Arguments

session_name

The name of a session.

num_vars

The number of best features to report, according to model importance.

score_filter

The model related Z score can be used to filter less relevant features.

recompute_DTM

Whether to recompute the DTM.

sessions_folder

The folder in which all sessions are stored.

Details

In addition to the model derived scores, the variable importance according to a Poisson regression is used to estimate the association (as log-linear regressor and Z score) of a term with relevant records. This approach is helpful to distinguish between terms being relevant by themselves (both the model related and the linear Z scores are high) or in association with other terms (only the model Z score is high).

Value

A data frame with the features (and the part of the record they are related to), the model importance score and its Z score (if an ensemble of models is used), the log-linear association according to the Poisson model and the linear Z score.

Examples

## Not run: 
extract_var_imp("Session1")

## End(Not run)

bakaburg1/BaySREn documentation built on March 30, 2022, 12:16 a.m.