analyzePopulationFeatures: Analyze Features in a Population of Models
In predomics/predomicspkg: Interpretable Prediction in Omics Data

analyzePopulationFeatures

R Documentation

Analyze Features in a Population of Models

Description

This function analyzes features in a population of models, allowing for the visualization and examination of feature importance, prevalence, and model coefficients. It can generate a variety of plots to understand the distribution and importance of features in the given population.

Usage

analyzePopulationFeatures(
  pop,
  X,
  y,
  res_clf,
  makeplot = TRUE,
  name = "",
  ord.feat = "importance",
  make.network = TRUE,
  network.layout = "circular",
  network.alpha = 1e-04,
  verbose = TRUE,
  pdf.dims = c(width = 25, height = 20),
  filter.perc = 0.05,
  k_penalty = 0.75/100,
  k_max = 0
)

Arguments

`pop`	A population of models, typically obtained from 'modelCollectionToPopulation' or similar functions.
`X`	The data matrix containing features (rows represent features, columns represent samples).
`y`	The response variable (class labels or continuous values depending on the model).
`res_clf`	The classifier used for the analysis, typically a result from a classification experiment.
`makeplot`	Logical. If 'TRUE', the function generates plots and saves them as a PDF. If 'FALSE', it returns the analysis results without plotting.
`name`	A string representing the name of the analysis or output (used for saving files).
`ord.feat`	A string indicating the ordering method for features. Options are: - "prevalence": Order by the prevalence of features across models. - "importance": Order by feature importance based on cross-validation. - "hierarchical": Order by hierarchical clustering of the feature-to-model coefficient matrix.
`make.network`	Logical. If 'TRUE', generates a network of feature co-occurrence across the population of models.
`network.layout`	A string indicating the layout of the network. Default is "circular". Other options may include "fr" for Fruchterman-Reingold layout.
`network.alpha`	A numeric value controlling the alpha transparency of the network plot.
`verbose`	Logical. If 'TRUE', prints additional information during execution.
`pdf.dims`	A vector of two numbers specifying the width and height of the PDF output (in inches).
`filter.perc`	A numeric value between 0 and 1 specifying the minimum prevalence of a feature to be included in the analysis.
`k_penalty`	A penalty value for model selection in the population filtering.
`k_max`	The maximum number of models to include in the final population after filtering.

Details

The function performs a variety of analyses on a population of models: - It filters models based on feature prevalence. - It orders features by various metrics such as prevalence, importance, or hierarchical clustering. - It generates plots of feature prevalence, model coefficients, and other characteristics. - If requested, it also generates a network of feature co-occurrence across the models.

Value

If 'makeplot = TRUE', returns a PDF with visualizations of feature importance, prevalence, and model coefficients. If 'makeplot = FALSE', returns a list of the analysis results including the normalized scores and feature importance.

Author(s)

Edi Prifti (IRD)

Examples

## Not run: 
# Assuming 'pop' is a valid population of models, 'X' is the feature matrix, and 'y' is the response variable
analyzePopulationFeatures(pop = pop, X = X, y = y, res_clf = res_clf, makeplot = TRUE, name = "population_analysis")

## End(Not run)

predomics/predomicspkg documentation built on Dec. 11, 2024, 11:06 a.m.