Machine learning (ML) is a powerful tool to create supervised models that can distinguish between classes and facilitate biomarker selection in high-dimensional datasets, including RNA Sequencing (RNA-Seq). However, identifying the best performing ML algorithm(s) for a specific dataset is time consuming. blkbox is a software package including a shiny frontend, that integrates nine ML algorithms to select the best performing classifier for a specific dataset. blkbox accepts a simple abundance matrix as input, includes extensive visualization, and also provides an easy to use feature selection step to enable convenient and rapid potential biomarker selection, all without requiring parameter optimization. Results: Feature selection makes blkbox computationally inexpensive while multi-functionality, including nested cross-fold validation (NCV), ensures robust results. blkbox identifies algorithms outperforming prior published ML results. Applying NCV identifies features which are utilized to gain high accuracy. Availability: The code is available as a CRAN R package and github (https://github.com/gboris/blkbox).
R-package blkbox can be installed:
After installation, the package can be loaded into R.
Details of how to use this package, please see the vignette.
Title: Data exploration with multiple machine learning algorithms
Author: Zachary Davies, Boris Guennewig
Maintainer: Boris Guennewig firstname.lastname@example.org
Description: Allows data to be processed by multiple machine learning algorithms at the same time, enables feature selection of data by single a algorithm or combinations of multiple. Easy to use tool for k-fold cross validation and nested cross validation.
License: GPL (>= 2)
Packaged: 2016-08-05 22:19:09 UTC; Guennewig
Built: R 3.2.5; ; 2016-08-05 00:49:44 UTC; unix
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.