knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
This package extracts meta features from a dataset to recommend what machine learning algorithm will perform the best without running all the implemented machine learning algorithms. The current selection of algorithms is limited to support vector machines, naiive bayes classifier and (k) nearest neighbors. The metric recall isused to give the recommended classifier. The meta learner utilizes support vector regression with a radial basis function kernel to predict the recommended algorithm.
In order to install this package, Python 3.7 must be install. Additionallay, numpy 1.17.4, pandas 0.25.1, and sci-kit learn 0.21.3 are required python packages.
You can install the the development version from GitHub with:
# install.packages("devtools") devtools::install_github("clarencew0083/recSystem", INSTALL_opts=c("--no-multiarch"))
Once the package is installed, load and attach it using:
library(recSystem) ## basic example code
You can launch the shiny app using
recSystem::run_my_app("recSystemApp")
When lauching the app the following screen will appear:
You can launch the recommend function from the console using
recSystem::recommend()
In this case, choose a csv file using the file explorer and type in the name of the target column.
Alternatively, use the function recommend2 to use the presupplied datasets
out<- recommend2(math_placement, "CourseSuccess") View(out[[1]]) print(out[[2]])
View documentaion of the recommend and recommend2 functions for examples.
The recSystem package is an implementation of @Williams Master's thesis.
If you are curious as to what exactly is going on under the hood, the next picture from @Woods may shed some light.
The Meta Learning Recommendation framework was first proposed by @Cui et al and is a modification of the model from @Rice. The framewoek is further refined in the AFIT Masters’ Thesis of Megan Woods @Woods. It extracts the meta features (f) of members of the problem space (P).
The meta learning recommendation system for classification problems consists of two phases. The steps of phase one are as follows:
Candidate Problem Space
The Candidate Problem space (C) is all problems suitable for classification. Since this set is large, it is subsetted to form the problem space (P) which contains the problems under study.
Algorithm Prediction Space
The machine learning algorithms K Nearest Neighbors, Support Vector Machines and Naive Bayes Classifier form the algorithm space (A). The three algorithms are subsequently applied to each member of the problem space with recall being the performance metric captured for each dataset.
The recall dataset included with this package contains the algorithm performance described within this step.
Actual Performance
Each algorithm has its performance ranked when applied to each datasets in the problem space. This ranking is repeated for each performance metric to give a separate ranking for each metric.
When recall is the performance metric, the best algorithm is the one with the largest value. In this case, the best algorithm (r) is given by
[r = argmax_{a∈A}(z(a(x))) ].
In phase 2 of the meta learning recommendation system, the following steps occur:
New Candidate Problem
A new problem (c') is selected. This is the dataset chosen by the user when calling the various recommend functions of this package.
Meta Feature Extraction
Each of the members of the problem space have information extracted to provide information about it’s structure. The following meta features are extracted:
Number of Rows
Number of Columns
Rows to Columns Ratio
Number of Discrete Columns
Maximum number of factors among discrete columns
Minimum number of factors among discrete columns
Average number of factors among discrete columns
Number of continuous columns
Gradient average
Gradient minimum
Gradient maximum
Gradient standard deviation
Meta Learning
A new dataset is formed where each row is the collection of the 12 meta features extracted each dataset in the prblem space. These twelve features together form the feature space (F).
Recommendation System Construction
Support Vector Regression is the meta learner that trains the recommendation system using the meta features as inputs with a metric of the performance space for each algorithm as output.
Performance Prediction and Recommendation
The recommendation system predicts the performance of the machine learning algorithms K Nearest Neighbors, Support Vector Machines and Naive Bayes Classifier for each member of the problem space. Each algorithm, (a \in A) , has its performance ranked when applied to each of thedatasets in the problem space. This ranking is repeated for each performance metric to give separate rankings. Similar to phase one, when using recall as the performance metric, the best algorithm is the one with the largest value, that is the recommendation (r’) is
[r' = argmax(v(\widehat{f (x')}))].
The algorithm returned by the recommend function in this package is (r')
Cui, Can, Mengqi Hu, Jeffery D. Weir, and Teresa Wu. 2016. “A Recommendation System for Meta-Modeling: A Meta-Learning based Approach.” Expert Systems with Applications 46: 33–44.
Rice, John. 1975. “The Algorithm Selection Problem.” Purdue University, Department of Computer Science Technical Reports, Paper 99.
Williams, Clarence. 2020. “Meta Learning Recommendation System for Classification.” Master’s thesis, Wright-Patterson AFB, OH: School of Engineering; Management, Air Force Institute of Technology (AU).
Woods, Megan. 2020. “A Metamodel Recommendation System using Meta-Learning.” Master’s thesis, Wright-Patterson AFB, OH: School of Engineering; Management, Air Force Institute of Technology (AU).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.