analyzeImportanceFeatures: Analyze Feature Importance for Machine Learning Models

View source: R/global.visu.R

analyzeImportanceFeaturesR Documentation

Analyze Feature Importance for Machine Learning Models

Description

This function analyzes the importance of features in a set of machine learning models. It computes various plots related to feature importance, prevalence, and effect sizes. The function can handle both classification and regression tasks. It can process a single experiment or multiple experiments and generate corresponding visualizations in a PDF file.

Usage

analyzeImportanceFeatures(
  clf_res,
  X,
  y,
  makeplot = TRUE,
  name = "",
  verbose = TRUE,
  pdf.dims = c(width = 25, height = 20),
  filter.perc = 0.05,
  filter.cv.prev = 0.25,
  nb.top.features = 100,
  scaled.importance = FALSE,
  k_penalty = 0.75/100,
  k_max = 0
)

Arguments

clf_res

An object of class experiment or a list of experiments containing machine learning models to analyze.

X

A data frame or matrix containing the feature data used in the model.

y

A vector containing the target variable (binary or continuous values depending on the task).

makeplot

Logical, if 'TRUE', plots will be generated and saved as a PDF. Default is 'TRUE'.

name

A string to specify the name used in output files (e.g., for saving the PDF).

verbose

Logical, if 'TRUE', the function will print progress messages. Default is 'TRUE'.

pdf.dims

Numeric vector specifying the dimensions of the output PDF (width and height). Default is 'c(width = 25, height = 20)'.

filter.perc

Numeric, percentage threshold used to filter out features that appear in less than 'filter.perc' of the models. Default is '0.05' (5%).

filter.cv.prev

Numeric, cross-validation threshold used to filter the importance of features based on their performance. Default is '0.25'.

nb.top.features

Numeric, the number of top features to select based on importance. Default is '100'.

scaled.importance

Logical, if 'TRUE', scales the feature importance scores. Default is 'FALSE'.

k_penalty

Numeric, penalty factor for selecting top features in models. Default is '0.75/100'.

k_max

Numeric, the maximum number of features to consider. Default is '0' (no limit).

Details

This function analyzes feature importance and creates visualizations of features that contribute most to the model predictions. It can handle classification and regression tasks. The function computes several types of graphics:

  • Feature Importance: Plots the importance of features across models.

  • Prevalence of Features: Shows the prevalence of features across different groups (e.g., class 1 and class -1 in classification tasks).

  • Abundance of Features: Shows how frequently features appear across the dataset.

  • Feature Model Coefficients: Visualizes the coefficients of features in the models.

The results are saved as a PDF document and also plotted directly within R.

Value

The function returns 'NULL' if no models are found or after the plot has been saved. It generates a PDF containing multiple plots: feature importance, prevalence, abundance, and model coefficients.

Author(s)

Edi Prifti (IRD)

See Also

modelCollectionToPopulation, plotPrevalence, plotAbundanceByClass, plotFeatureModelCoeffs

Examples

# Assume clf_res is a list of experiment results, and X and y are your data
result <- analyzeImportanceFeatures(clf_res, X, y, makeplot = TRUE, name = "Feature_Analysis", verbose = TRUE)

# You can access the plots via result if you choose not to save them as PDFs


predomics/predomicspkg documentation built on Dec. 11, 2024, 11:06 a.m.