In clarencew0083/recSystem: Meta Learning Recommendation System

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

Meta Learning Recommendation System

Overview

This package extracts meta features from a dataset to recommend what machine learning algorithm will perform the best without running all the implemented machine learning algorithms. The current selection of algorithms is limited to support vector machines, naiive bayes classifier and (k) nearest neighbors. The metric recall isused to give the recommended classifier. The meta learner utilizes support vector regression with a radial basis function kernel to predict the recommended algorithm.

Additionally, this package cleans the data in the following manner:
drop columns that have the exact same input for each row
drop rows that have NA's
drop object columns that have all unique values
one hot encoding for categorical variables
normalize continuous columns
label encode response

Installation

In order to install this package, Python 3.7 must be install. Additionallay, numpy 1.17.4, pandas 0.25.1, and sci-kit learn 0.21.3 are required python packages.

You can install the the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("clarencew0083/recSystem", INSTALL_opts=c("--no-multiarch"))

Once the package is installed, load and attach it using:

library(recSystem)
## basic example code

Example

You can launch the shiny app using

recSystem::run_my_app("recSystemApp")

When lauching the app the following screen will appear:

Screenshot Example

Click the browse button to upload a csv file.
Once the file is uploaded it will be displayed on the screen.
Select the target column using the drop-down
Click Go! to get the recommended classification algorithm for your chosen dataset

Screenshot Example

The recommended classfication algorithm is displayed in bold
Click the download button to download the cleaned dataset if desired

Console Example

You can launch the recommend function from the console using

recSystem::recommend()

In this case, choose a csv file using the file explorer and type in the name of the target column.

Alternatively, use the function recommend2 to use the presupplied datasets

out<- recommend2(math_placement, "CourseSuccess")
View(out[[1]])
print(out[[2]])

View documentaion of the recommend and recommend2 functions for examples.

Background Information

The recSystem package is an implementation of @Williams Master's thesis.

If you are curious as to what exactly is going on under the hood, the next picture from @Woods may shed some light.

Screenshot Example

The Meta Learning Recommendation framework was first proposed by @Cui et al and is a modification of the model from @Rice. The framewoek is further refined in the AFIT Masters’ Thesis of Megan Woods @Woods. It extracts the meta features (f) of members of the problem space (P).

The meta learning recommendation system for classification problems consists of two phases. The steps of phase one are as follows:

Candidate Problem Space

The Candidate Problem space (C) is all problems suitable for classification. Since this set is large, it is subsetted to form the problem space (P) which contains the problems under study.
Algorithm Prediction Space

The machine learning algorithms K Nearest Neighbors, Support Vector Machines and Naive Bayes Classifier form the algorithm space (A). The three algorithms are subsequently applied to each member of the problem space with recall being the performance metric captured for each dataset.

The recall dataset included with this package contains the algorithm performance described within this step.
Actual Performance

Each algorithm has its performance ranked when applied to each datasets in the problem space. This ranking is repeated for each performance metric to give a separate ranking for each metric.

When recall is the performance metric, the best algorithm is the one with the largest value. In this case, the best algorithm (r) is given by

[r = argmax_{a∈A}(z(a(x))) ].

In phase 2 of the meta learning recommendation system, the following steps occur:

New Candidate Problem

A new problem (c') is selected. This is the dataset chosen by the user when calling the various recommend functions of this package.
Meta Feature Extraction

Each of the members of the problem space have information extracted to provide information about it’s structure. The following meta features are extracted:
- Number of Rows
- Number of Columns
- Rows to Columns Ratio
- Number of Discrete Columns
- Maximum number of factors among discrete columns
- Minimum number of factors among discrete columns
- Average number of factors among discrete columns
- Number of continuous columns
- Gradient average
- Gradient minimum
- Gradient maximum
- Gradient standard deviation
Meta Learning

A new dataset is formed where each row is the collection of the 12 meta features extracted each dataset in the prblem space. These twelve features together form the feature space (F).
Recommendation System Construction

Support Vector Regression is the meta learner that trains the recommendation system using the meta features as inputs with a metric of the performance space for each algorithm as output.
Performance Prediction and Recommendation

The recommendation system predicts the performance of the machine learning algorithms K Nearest Neighbors, Support Vector Machines and Naive Bayes Classifier for each member of the problem space. Each algorithm, (a \in A) , has its performance ranked when applied to each of thedatasets in the problem space. This ranking is repeated for each performance metric to give separate rankings. Similar to phase one, when using recall as the performance metric, the best algorithm is the one with the largest value, that is the recommendation (r’) is

[r' = argmax(v(\widehat{f (x')}))].

The algorithm returned by the recommend function in this package is (r')

References

Cui, Can, Mengqi Hu, Jeffery D. Weir, and Teresa Wu. 2016. “A Recommendation System for Meta-Modeling: A Meta-Learning based Approach.” Expert Systems with Applications 46: 33–44.

Rice, John. 1975. “The Algorithm Selection Problem.” Purdue University, Department of Computer Science Technical Reports, Paper 99.

Williams, Clarence. 2020. “Meta Learning Recommendation System for Classification.” Master’s thesis, Wright-Patterson AFB, OH: School of Engineering; Management, Air Force Institute of Technology (AU).

Woods, Megan. 2020. “A Metamodel Recommendation System using Meta-Learning.” Master’s thesis, Wright-Patterson AFB, OH: School of Engineering; Management, Air Force Institute of Technology (AU).

clarencew0083/recSystem documentation built on March 19, 2020, 11:52 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com