Description Usage Arguments Value Examples
This function implements a genetic algorithm for variable selection in linear regression and GLM. Genetic algorithms is essentially an optimization problem. In feature selection, it uses the given fitness function (e.g. AIC) as the objective function and conduct multiple rounds of an update process to approach the optimal solution. For feature selection, it first generates a population of many possible combination for selecting a subset of features. Then, among this population, the best ones are selected according the objective function and from these parents, a new population with the same size as before are randomly generated. After many iterations, the best solutions from the population would approach the optimal sulotion, which is a binary string indicating the selection of a subset of independent variables.
1 |
data |
A data frame with one response variable and arbitrary number of dependent variables. Order does not matter. |
target |
Column name of the response variable in |
fit_method |
Regression method, either |
metric |
Objective function, default is |
A vector of selected column names in the input data frame data
. Return type is character vector.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 | # Example 1
Setup:
# https://www.kaggle.com/c/house-prices-advanced-regression-techniques
# House Prices: Advanced Regression Techniques
# Predict sales prices
dt_house <- read.csv("../data/data_house.csv")
dt_house <- dt_house[, c("MSSubClass", "MSZoning", "LotArea", "LotShape", "Alley", "LandContour", "LotConfig", "LandSlope", "Neighborhood", "BldgType", "WoodDeckSF", "OpenPorchSF", "HouseStyle", "OverallQual", "OverallCond","SaleType", "SaleCondition", "LotFrontage", "MoSold", "SalePrice")]
dt_house[, "MSSubClass"] <- as.factor(dt_house[, "MSSubClass"])
dt_house[, "MoSold"] <- as.factor(dt_house[, "MoSold"])
dt_house[, "LotArea"] <- as.numeric(dt_house[, "LotArea"])
dt_house[, "LotShape"] <- as.factor(dt_house[, "LotShape"])
dt_house[, "Alley"] <- as.factor(dt_house[, "Alley"])
dt_house[, "LandContour"] <- as.factor(dt_house[, "LandContour"])
dt_house[, "LotConfig"] <- as.factor(dt_house[, "LotConfig"])
dt_house[, "LandSlope"] <- as.factor(dt_house[, "LandSlope"])
dt_house[, "Neighborhood"] <- as.factor(dt_house[, "Neighborhood"])
dt_house[, "BldgType"] <- as.factor(dt_house[, "BldgType"])
dt_house[, "WoodDeckSF"] <- as.numeric(dt_house[, "WoodDeckSF"])
dt_house[, "OpenPorchSF"] <- as.numeric(dt_house[, "OpenPorchSF"])
dt_house[, "HouseStyle"] <- as.factor(dt_house[, "HouseStyle"])
dt_house[, "OverallQual"] <- as.numeric(dt_house[, "OverallQual"])
dt_house[, "OverallCond"] <- as.numeric(dt_house[, "OverallCond"])
dt_house[, "SaleType"] <- as.factor(dt_house[, "SaleType"])
dt_house[, "SaleCondition"] <- as.factor(dt_house[, "SaleCondition"])
dt_house[, "LotFrontage"] <- as.numeric(dt_house[, "LotFrontage"])
dt_house[, "MoSold"] <- as.factor(dt_house[, "MoSold"])
dt_house[, "SalePrice"] <- as.numeric(dt_house[, "SalePrice"])
# Execution
select(dt_house, 'SalePrice', fit_method = 'lm', metric = 'aic')
# Example 2
# Setup:
# https://www.kaggle.com/uciml/red-wine-quality-cortez-et-al-2009
# Red Wine Quality
dt_wine <- read.csv("../data/data_wine.csv")
dt_wine[, "quality"] <- as.numeric(dt_wine[, "quality"])
# Execution:
select(dt_wine, 'quality', fit_method = 'lm', metric = 'aic')
# Example 3
# Setup:
# https://www.kaggle.com/kumarajarshi/life-expectancy-who
# Life Expectancy (WHO)
# Statistical Analysis on factors influencing Life Expectancy
dt_life <- read.csv("./data/data_life.csv")
dt_life[, "Country"] <- as.factor(dt_life[, "Country"])
dt_life[, "Year"] <- as.numeric(dt_life[, "Year"])
dt_life[, "Status"] <- as.factor(dt_life[, "Status"])
dt_life[, "Life.expectancy"] <- as.numeric(dt_life[, "Life.expectancy"])
for(i in 5:dim(dt_life)[2]){ dt_life[, i] <- as.numeric(dt_life[, i]) }
# Execution:
select(dt_life, 'Life.expectancy', fit_method = 'lm', metric = 'aic')
# Example 4
# Setup:
# Bike sharing dataset
dt_bike <- read.csv("./data/data_bike.csv")
dt_bike[, 'dteday'] <- as.numeric(as.Date(dt_bike[, 'dteday']))
dt_bike[, 'yr'] <- as.factor(dt_bike[, 'yr'])
dt_bike[, 'mnth'] <- as.factor(dt_bike[, 'mnth'])
dt_bike[, 'holiday'] <- as.factor(dt_bike[, 'holiday'])
dt_bike[, 'workingday'] <- as.factor(dt_bike[, 'workingday'])
dt_bike[, 'weathersit'] <- as.factor(dt_bike[, 'weathersit'])
dt_bike$instant <- NULL
dt_bike$registered <- NULL
dt_bike$casual <- NULL
# Execution:
select(dt_bike, 'cnt', fit_method = 'lm', metric = 'aic')
# Example 5
# Setup:
# Basic data set loading and test of function lm()
# Load a build in data set BostonHousing
library(mlbench)
data(BostonHousing)
# Execution:
select(BostonHousing, 'medv', fit_method = 'lm', metric = 'aic')
score_value <- rep(0,repetition)
score_value[k] <- temp[[2]]
plot(score_value)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.