knitr::opts_chunk$set(echo = TRUE)
feseR provides funcionalities to combine multiple Feature Selection (FS) methods to analyze high-dimensional omics data in R environment. The different feature selection steps can be classificated in: Univariate (Correlation filter and Gain Information), Multivariate (Principal Component Analysis and Matrix Correlation based) and Recursive Feature Elimination (wrapped up with a Machine Learning algorithm). The goal is to assemble the different steps in an efficient workflow to perform feature selection task in the context of classification and regression problems. The package includes also several example dataset.
We provide some example dataset (Transcriptomics and Proteomics) with the package. Some general description of the data are listed bellow:
Note: Datasets are expected to be a matrix with features in columns and samples in rows.
library(feseR) # loading example data (TNBC) data(TNBC) # getting features features <- TNBC[,-ncol(TNBC)] # getting class variable (expected last column) class <- TNBC[,ncol(TNBC)] # pre-filtering # keep only features (cols) with maximal missing rate 0.25 across samples (rows) features <- filterMissingnessRate(features, max_missing_rate = 0.25) # impute missing values features <- imputeMatrix(features, method = "mean") # Scale data features. These transformations coerce the original predictors # to have zero mean and standard deviation equal one. features <- scale(features, center=TRUE, scale=TRUE)
# filtering by correlation output <- filter.corr(features = features, class = class, mincorr = 0.3) # filtering by gain information output <- filter.gain.inf(features = features, class = class, zero.gain.out = TRUE)
# filtering by matrix correlation (cutoff 0.75) output <- filter.matrix.corr(features = features, maxcorr = 0.75) # data dimension reduction using PCA (return only PCs explaining 95% of the variance) output <- filter.pca(features = features, cum.var.cutoff = .95)
This function allows to combine multiple feature selection methods in a workflow
# combining filter univariate corr., multivariate matrix corr. and # recursive feature elimination wrapped with random forest results <- combineFS(features = features, class = class, univariate = 'corr', mincorr = 0.3, multivariate = 'mcorr', maxcorr = 0.75, wrapper = 'rfe.rf', number.cv = 10, group.sizes = seq(1,100,10), verbose = F, extfolds = 10) # getting the metrics from the training process training_results <- results$training # getting the metrics from the testing process testing_results <- results$testing
\newpage
Results from the training phase
pander::pandoc.table(training_results, digits = 4, split.table = Inf, caption = 'Best model metrics from 10-folds cross-validation resampling.')
\newpage
Results from the testing phase
pander::pandoc.table(testing_results, digits = 4, split.table = Inf, caption = 'Classification metrics from ten class-balanced and randomized runs.')
\newpage
# plot PCA (PC1 vs. PC2) plot_pca(features = features, class = class, list.plot = FALSE)
\newpage
# getting the filtered matrix filtered.features <- features[,results$opt.variables] # plot PCA (PC1 vs. PC2) plot_pca(features = filtered.features, class = class, list.plot = FALSE)
\newpage
# plot correlation matrix plot_corr(features = filtered.features, corr.method = 'pearson')
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.