In enriquea/feseR: Feature Selection workflow in R

knitr::opts_chunk$set(echo = TRUE)

Introduction

feseR provides funcionalities to combine multiple Feature Selection (FS) methods to analyze high-dimensional omics data in R environment. The different feature selection steps can be classificated in: Univariate (Correlation filter and Gain Information), Multivariate (Principal Component Analysis and Matrix Correlation based) and Recursive Feature Elimination (wrapped up with a Machine Learning algorithm). The goal is to assemble the different steps in an efficient workflow to perform feature selection task in the context of classification and regression problems. The package includes also several example dataset.

Available dataset

We provide some example dataset (Transcriptomics and Proteomics) with the package. Some general description of the data are listed bellow:

TNBC (Label-free deep proteome analysis of 44 (samples and technical replicates) human breast specimens)
GSE5325 (Analysis of breast cancer tumor samples using 2-color cDNA microarrays)
GSE48760 (Transcriptomics analysis of left ventricles of mouse subjected to an isoproterenol challenge)

Note: Datasets are expected to be a matrix with features in columns and samples in rows.

Examples

Preparing your data

  library(feseR)

   # loading example data (TNBC)
   data(TNBC)

   # getting features
   features <- TNBC[,-ncol(TNBC)]

   # getting class variable (expected last column)
   class <- TNBC[,ncol(TNBC)]

   # pre-filtering
   # keep only features (cols) with maximal missing rate 0.25 across samples (rows)
   features <- filterMissingnessRate(features, max_missing_rate = 0.25)

   # impute missing values
   features <- imputeMatrix(features, method = "mean")

   # Scale data features. These transformations coerce the original predictors 
   # to have zero mean and standard deviation equal one.
   features <- scale(features, center=TRUE, scale=TRUE)

Univariate filter examples

  # filtering by correlation
  output <- filter.corr(features = features, class = class, mincorr = 0.3)

  # filtering by gain information
  output <- filter.gain.inf(features = features, class = class, zero.gain.out = TRUE)

Multivariate filter examples

  # filtering by matrix correlation (cutoff 0.75)
  output <- filter.matrix.corr(features = features, maxcorr = 0.75)

  # data dimension reduction using PCA (return only PCs explaining 95% of the variance)
  output <- filter.pca(features = features, cum.var.cutoff = .95)

Combining Feature Selection methods

This function allows to combine multiple feature selection methods in a workflow

  # combining filter univariate corr., multivariate matrix corr. and
  # recursive feature elimination wrapped with random forest
   results <- combineFS(features = features, class = class,
                        univariate = 'corr', mincorr = 0.3,
                        multivariate = 'mcorr', maxcorr = 0.75,
                        wrapper = 'rfe.rf', number.cv = 10, 
                        group.sizes = seq(1,100,10), 
                        verbose = F, extfolds = 10)


   # getting the metrics from the training process
   training_results <- results$training

   # getting the metrics from the testing process
   testing_results <- results$testing

\newpage

Results from the training phase

   pander::pandoc.table(training_results, digits = 4,  split.table = Inf,
                       caption = 'Best model metrics from 10-folds cross-validation resampling.')

\newpage

Results from the testing phase

   pander::pandoc.table(testing_results, digits = 4,  split.table = Inf,
                       caption = 'Classification metrics from ten class-balanced and randomized runs.')

\newpage

Visualizing the Feature Selection process

Groups distribution on the first two Principal Components (PC1 and PC2) from the original data (without apply any Feature Selection method).

# plot PCA (PC1 vs. PC2)
plot_pca(features = features, class = class, list.plot = FALSE)

\newpage

Groups distribution on the first two Principal Components (PC1 and PC2) after to apply the Feature Selection workflow.

# getting the filtered matrix
filtered.features <- features[,results$opt.variables]

# plot PCA (PC1 vs. PC2)
plot_pca(features = filtered.features, class = class, list.plot = FALSE)

\newpage

Plotting the correlation matrix of final features.

# plot correlation matrix
plot_corr(features = filtered.features, corr.method = 'pearson')

enriquea/feseR documentation built on Feb. 25, 2025, 12:20 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

enriquea/feseR
Feature Selection workflow in R

In enriquea/feseR: Feature Selection workflow in R

Introduction

Available dataset

Examples

Preparing your data

Univariate filter examples

Multivariate filter examples

Combining Feature Selection methods

Visualizing the Feature Selection process

R Package Documentation

Browse R Packages

We want your feedback!

enriquea/feseR Feature Selection workflow in R

In enriquea/feseR: Feature Selection workflow in R

Introduction

Available dataset

Examples

Preparing your data

Univariate filter examples

Multivariate filter examples

Combining Feature Selection methods

Visualizing the Feature Selection process

R Package Documentation

Browse R Packages

We want your feedback!

enriquea/feseR
Feature Selection workflow in R