In thllwg/tpotr: An R-Wrapper for the Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming (TPOT)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

A gentle introduction

The tpotr package, Tree-based Pipeline Optimization Tool in R, is conceived as a "Data Science Assistant" to automate the most tedious part of machine learning: Finding the best pipeline for your data.

The tpotr package enables you to use the well known python based tpot module in your favourite programming language R. TPOT intelligently explores thousands of possible machine learning pipelines by using genetic programming. Once finished, the fitted pipeline can be accessed in R and used for prediction.

The best way to illustrate the process of fitting a machine learning pipeline in tpotr is by example. Assume the goal is to find a pipeline for the well known iris dataset. We first split the dataset to get one for training and one for validation purposes:

library(tpotr)

data(iris)
# 75% of the sample size
smp_size <- floor(0.75 * nrow(iris))

# set the seed to make your partition reproducible
set.seed(123)
train_ind <- sample(seq_len(nrow(iris)), size = smp_size)

train <- iris[train_ind, ]
test <- iris[-train_ind, ]
train[5] <- as.numeric(train[[5]])
test[5] <- as.numeric(test[[5]])

The iris dataset holds 150 observations with 5 features. We will use the first 4 features to predict the 5th, the species. As this is a classification task, we use a TPOTRClassifier to fit a pipeline:

library(tpotr)
classifier <- TPOTRClassifier(verbosity=2, generations = 5, population_size=15, n_jobs = 3)
classifier <- fit(classifier, train[1:4], train[5])

You can now use the fitted pipeline for predictions. To assess the accurancy of the predictions, the score() method can be used.

predict(classifier, test[1:4])
score(classifier, test[1:4], test[5])

Using MLR

The tpotr package provides learner integration with the famous mlr package. You can utilize automated machine learning pipelines with mlr as follows:

library("mlr")
train[5] <- as.factor(as.numeric(train[[5]]))
test[5] <- as.factor(as.numeric(test[[5]]))
task = makeClassifTask(data = train, target = "Species", id = "iris")
learner = makeLearner(cl = "classif.tpot", population_size = 10, generations = 3, n_jobs = 3, verbosity = 2)
model = train(learner, task)
pred = predict(obj = model, newdata = test)
performance(pred, measures = list(acc))

thllwg/tpotr documentation built on July 5, 2019, 12:49 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

thllwg/tpotr
An R-Wrapper for the Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming (TPOT)

In thllwg/tpotr: An R-Wrapper for the Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming (TPOT)

A gentle introduction

Using MLR

R Package Documentation

Browse R Packages

We want your feedback!

thllwg/tpotr An R-Wrapper for the Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming (TPOT)

In thllwg/tpotr: An R-Wrapper for the Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming (TPOT)

A gentle introduction

Using MLR

R Package Documentation

Browse R Packages

We want your feedback!

thllwg/tpotr
An R-Wrapper for the Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming (TPOT)