This project aims to build an R package that elegantly performs data pre-processing in a fast and easy manner. With four separate functions that will come along with the rb4model package, users will have greater flexibility in handling many different types of datasets in the wild or those collected by them. With the rb4model package, users will be able to smoothly pre-process their data and have it ready for the machine learning model of their choice.
It is recommended that rb4model
be installed via devtools
.
devtools::install_github("UBC-MDS/rb4model")
missing_val
feature_splitter
fit_and_report
ForwardSelection
This is a basic example which shows you how to solve a common machine learning problem.
library(rb4model)
Let’s work with the iris dataset.
iris_copy <- iris
iris_copy[1,1]<-NA
head(iris_copy)
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 NA 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
Use the missing_val
function to fill in any missing values in your
dataframe.
head(missing_val(df=iris_copy, method='mean'))
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.848322 3.5 1.4 0.2 setosa
## 2 4.900000 3.0 1.4 0.2 setosa
## 3 4.700000 3.2 1.3 0.2 setosa
## 4 4.600000 3.1 1.5 0.2 setosa
## 5 5.000000 3.6 1.4 0.2 setosa
## 6 5.400000 3.9 1.7 0.4 setosa
Use the feature_splitter
function to split your data into categorical
and numerical features.
feature_splitter(iris)
## [[1]]
## [1] "Species"
##
## [[2]]
## [1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
Use the fit_and_report
function to fit a machine learning model of
choice and report its training and validation score.
x1<- iris[1:2][1:100,]
x2<-iris[1:2][100:150,]
y1<- iris$Petal.Length[1:100]
y2<-iris$Petal.Length[100:150]
fit_and_report(x1,y1,x2,y2,'glm','regression')
## Loading required package: lattice
## Loading required package: ggplot2
## RMSE
## 0.44201045 0.08478573
Use the ForwardSelection
function to perform forward feature selection
on your data. Then subset your original dataframe with the selected
features.
y <- iris$Species
x <- iris[c(1,2,3,4)]
ffs <- ForwardSelection(feature=x, label=y, my_mod="rf")
## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
##
## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
##
## note: only 1 unique complexity parameters in default grid. Truncating the grid to 1 .
head(x[ffs])
## Sepal.Width
## 1 3.5
## 2 3.0
## 3 3.2
## 4 3.1
## 5 3.6
## 6 3.9
| Package | Version | | ------------------------------------------------------------------------------------ | ------- | | mice | 3.7.0 | | stats | 4.0.0 | | testthat | 2.1.0 | | caret | 6.0-85 | | datasets | 4.0.0 | | mlbench | 2.1-1 | | randomForest | 4.6-14 | | e1071 | 1.7-3 | | plyr | 1.8.6 |
caret
is probably the most widely used package for supervised learning
problems in R. Although the library provides various model fitting and
preprocessing features, programmers end up with writing the same line of
code over and over again. Our rb4model
library provides a simple
solution to this pain point: wrapper functions of caret
and other
primary libraries used for supervised learning to reduce lines of code
and promote efficiency.
The vignette is hosted here.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.