Home

/

GitHub

/

kota7/MLPipe

/

docs/index.md

MLPipe: Machine Learning Pipeline

MLPipe package

Kota Mori

MLPipe privides interface for creating machine learning pipeline in the style of Python's scikit-learn library.

Install from GitHub:

devtools::install_github('kota7/MLPipe')

Let's use Sonar data from mlbench package as an example. Sonar data contain 60 numeric variables and 1 label (factor variable). The label represents the types of sonar targets: "rock" by "R" and "metal" by "M". Our goal is to predict the label from the other variables. See ?(mlbench::Sonar) for more details about the data.

set.seed(123)
data(Sonar, package='mlbench')
print(dim(Sonar))

## [1] 208  61

head(Sonar[c(1:4, 57:61)])

##       V1     V2     V3     V4    V57    V58    V59    V60 Class
## 1 0.0200 0.0371 0.0428 0.0207 0.0180 0.0084 0.0090 0.0032     R
## 2 0.0453 0.0523 0.0843 0.0689 0.0140 0.0049 0.0052 0.0044     R
## 3 0.0262 0.0582 0.1099 0.1083 0.0316 0.0164 0.0095 0.0078     R
## 4 0.0100 0.0171 0.0623 0.0205 0.0050 0.0044 0.0040 0.0117     R
## 5 0.0762 0.0666 0.0481 0.0394 0.0072 0.0048 0.0107 0.0094     R
## 6 0.0286 0.0453 0.0277 0.0174 0.0057 0.0027 0.0051 0.0062     R

We pick 80% of the samples as the training data, and use the rest for the performance test.

X <- Sonar[, -ncol(Sonar)]
y <- Sonar[, ncol(Sonar)]
tr <- c(sample(1:111,111*0.8), sample(112:200,89*0.8))

Suppose that we would like to first conduct the dimensionality reduction of the features, and then fit to a classification model. For example, the first step can be done by extraction of principal components, and the second step can be a neural network model. We can construct a pipeline of this modelling procedure by the code below:

library(MLPipe)
p <- pipeline(pc=pca_extractor(ncomp=30),
              ml=mlp_classifier(hidden_sizes=c(5, 5), num_epoch=1000))

function takes arbitrary number of pipe components objects. Each pipe component represents a modelling step, and they are in the order of procedure.

We can now fit the the model to the training data by fit method of the pipeline. fit method takes exactly two inputs, x and y,

p$fit(X[tr,], y[tr])

By this single call of fit, each component of the pipeline is fitted. In this example, first, principal component analysis for x is conducted, and the first 30 (specified as ncomp) principal components are extracted as the features. Then, the neural network model is estimated with these new features and y.

mlp_classifier has the predict function, which returns the predicted labels for new data. We can make a confusion matrix using it.

table(y[-tr], p$predict(X[-tr,]))

##    
##      M  R
##   M 23  4
##   R  2 20

We can also use mlp_classifier's accuracy method to see the fraction of correct classifications.

p$evaluate('accuracy', X[-tr,], y[-tr])

## [1] 0.877551

kota7/MLPipe documentation built on May 5, 2019, 5:53 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

kota7/MLPipe
Machine Learning Pipeline

docs/index.md
In kota7/MLPipe: Machine Learning Pipeline

MLPipe package

Overview

Installation

Quick Start

R Package Documentation

Browse R Packages

We want your feedback!

kota7/MLPipe Machine Learning Pipeline

docs/index.md In kota7/MLPipe: Machine Learning Pipeline

MLPipe package

Overview

Installation

Quick Start

R Package Documentation

Browse R Packages

We want your feedback!

kota7/MLPipe
Machine Learning Pipeline

docs/index.md
In kota7/MLPipe: Machine Learning Pipeline