analyzePopulationFeatures: Analyze Features in a Population of Models

View source: R/global.visu.R

analyzePopulationFeaturesR Documentation

Analyze Features in a Population of Models

Description

This function analyzes features in a population of models, allowing for the visualization and examination of feature importance, prevalence, and model coefficients. It can generate a variety of plots to understand the distribution and importance of features in the given population.

Usage

analyzePopulationFeatures(
  pop,
  X,
  y,
  res_clf,
  makeplot = TRUE,
  name = "",
  ord.feat = "importance",
  make.network = TRUE,
  network.layout = "circular",
  network.alpha = 1e-04,
  verbose = TRUE,
  pdf.dims = c(width = 25, height = 20),
  filter.perc = 0.05,
  k_penalty = 0.75/100,
  k_max = 0
)

Arguments

pop

A population of models, typically obtained from 'modelCollectionToPopulation' or similar functions.

X

The data matrix containing features (rows represent features, columns represent samples).

y

The response variable (class labels or continuous values depending on the model).

res_clf

The classifier used for the analysis, typically a result from a classification experiment.

makeplot

Logical. If 'TRUE', the function generates plots and saves them as a PDF. If 'FALSE', it returns the analysis results without plotting.

name

A string representing the name of the analysis or output (used for saving files).

ord.feat

A string indicating the ordering method for features. Options are: - "prevalence": Order by the prevalence of features across models. - "importance": Order by feature importance based on cross-validation. - "hierarchical": Order by hierarchical clustering of the feature-to-model coefficient matrix.

make.network

Logical. If 'TRUE', generates a network of feature co-occurrence across the population of models.

network.layout

A string indicating the layout of the network. Default is "circular". Other options may include "fr" for Fruchterman-Reingold layout.

network.alpha

A numeric value controlling the alpha transparency of the network plot.

verbose

Logical. If 'TRUE', prints additional information during execution.

pdf.dims

A vector of two numbers specifying the width and height of the PDF output (in inches).

filter.perc

A numeric value between 0 and 1 specifying the minimum prevalence of a feature to be included in the analysis.

k_penalty

A penalty value for model selection in the population filtering.

k_max

The maximum number of models to include in the final population after filtering.

Details

The function performs a variety of analyses on a population of models: - It filters models based on feature prevalence. - It orders features by various metrics such as prevalence, importance, or hierarchical clustering. - It generates plots of feature prevalence, model coefficients, and other characteristics. - If requested, it also generates a network of feature co-occurrence across the models.

Value

If 'makeplot = TRUE', returns a PDF with visualizations of feature importance, prevalence, and model coefficients. If 'makeplot = FALSE', returns a list of the analysis results including the normalized scores and feature importance.

Author(s)

Edi Prifti (IRD)

Examples

## Not run: 
# Assuming 'pop' is a valid population of models, 'X' is the feature matrix, and 'y' is the response variable
analyzePopulationFeatures(pop = pop, X = X, y = y, res_clf = res_clf, makeplot = TRUE, name = "population_analysis")

## End(Not run)


predomics/predomicspkg documentation built on Dec. 11, 2024, 11:06 a.m.