analyzeImportanceFeatures: Analyze Feature Importance for Machine Learning Models
In predomics/predomicspkg: Interpretable Prediction in Omics Data

analyzeImportanceFeatures

R Documentation

Analyze Feature Importance for Machine Learning Models

Description

This function analyzes the importance of features in a set of machine learning models. It computes various plots related to feature importance, prevalence, and effect sizes. The function can handle both classification and regression tasks. It can process a single experiment or multiple experiments and generate corresponding visualizations in a PDF file.

Usage

analyzeImportanceFeatures(
  clf_res,
  X,
  y,
  makeplot = TRUE,
  name = "",
  verbose = TRUE,
  pdf.dims = c(width = 25, height = 20),
  filter.perc = 0.05,
  filter.cv.prev = 0.25,
  nb.top.features = 100,
  scaled.importance = FALSE,
  k_penalty = 0.75/100,
  k_max = 0
)

Arguments

`clf_res`	An object of class `experiment` or a list of experiments containing machine learning models to analyze.
`X`	A data frame or matrix containing the feature data used in the model.
`y`	A vector containing the target variable (binary or continuous values depending on the task).
`makeplot`	Logical, if 'TRUE', plots will be generated and saved as a PDF. Default is 'TRUE'.
`name`	A string to specify the name used in output files (e.g., for saving the PDF).
`verbose`	Logical, if 'TRUE', the function will print progress messages. Default is 'TRUE'.
`pdf.dims`	Numeric vector specifying the dimensions of the output PDF (width and height). Default is 'c(width = 25, height = 20)'.
`filter.perc`	Numeric, percentage threshold used to filter out features that appear in less than 'filter.perc' of the models. Default is '0.05' (5%).
`filter.cv.prev`	Numeric, cross-validation threshold used to filter the importance of features based on their performance. Default is '0.25'.
`nb.top.features`	Numeric, the number of top features to select based on importance. Default is '100'.
`scaled.importance`	Logical, if 'TRUE', scales the feature importance scores. Default is 'FALSE'.
`k_penalty`	Numeric, penalty factor for selecting top features in models. Default is '0.75/100'.
`k_max`	Numeric, the maximum number of features to consider. Default is '0' (no limit).

Details

This function analyzes feature importance and creates visualizations of features that contribute most to the model predictions. It can handle classification and regression tasks. The function computes several types of graphics:

Feature Importance: Plots the importance of features across models.
Prevalence of Features: Shows the prevalence of features across different groups (e.g., class 1 and class -1 in classification tasks).
Abundance of Features: Shows how frequently features appear across the dataset.
Feature Model Coefficients: Visualizes the coefficients of features in the models.

The results are saved as a PDF document and also plotted directly within R.

Value

The function returns 'NULL' if no models are found or after the plot has been saved. It generates a PDF containing multiple plots: feature importance, prevalence, abundance, and model coefficients.

Author(s)

Edi Prifti (IRD)

Examples

# Assume clf_res is a list of experiment results, and X and y are your data
result <- analyzeImportanceFeatures(clf_res, X, y, makeplot = TRUE, name = "Feature_Analysis", verbose = TRUE)

# You can access the plots via result if you choose not to save them as PDFs

predomics/predomicspkg documentation built on Dec. 11, 2024, 11:06 a.m.