knitr::opts_chunk$set( collapse = FALSE, comment = "#>" )
Vowpal Wabbit is an online machine learning system that is known for its speed and scalability and is widely used in research and industry.
This package aims to bring its functionality to R.
First you have to get Vowpal Wabbit from here.
And then install the rvw package using devtools
:
library(devtools) install_github("rvw-org/rvw")
rvw package gives you an access to various learning algorithms from Vowpal Wabbit. In this tutorial, you will see how you can use rvw for a multiclass classification problem.
Here we will try to predict the age group of abalone (based on number of abalone shell rings) from physical measurements. We will use Abalone Data Set from UCI Machine Learning Repository.
library("rvw") library("mltools") # We need "mltools" for data preparation library("magrittr") # We use "magrittr" for its "pipe" operator set.seed(1) data_url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data' data_names = c('sex','length','diameter','height','weight.w','weight.s','weight.v','weight.sh','rings') data_full = read.table(data_url, header = F , sep = ',', col.names = data_names) # Split number of rings into groups with equal (as possible) number of observations classes = 3 # Split into 3 groups data_full$group <- bin_data(data_full$rings, bins=classes, binType = "quantile") group_lvls <- levels(data_full$group) levels(data_full$group) <- seq_len(classes) # Prepare indices to split data ind_train <- sample(1:nrow(data_full), 0.8*nrow(data_full)) # Split data into train and test subsets df_train <- data_full[ind_train,] df_test <- data_full[-ind_train,]
In order to use Vowpal Wabbit we have to convert our data from data.frame
format to .vw
plain text format.
Each example should be formatted as follows:
[Label] [Importance] [Base] [Tag]|Namespace Features |Namespace Features ... |Namespace Features
One row from df_train
:
df_train[1,]
#> sex length diameter height weight.w weight.s weight.v weight.sh rings group #> 1110 M 0.52 0.4 0.145 0.7765 0.3525 0.1845 0.185 9 2
will be converted to:
#> 2 |a sex^M |b diameter:0.4 length:0.52 height:0.145 |c weight.w:0.7765 weight.s:0.3525 weight.v:0.1845 weight.sh:0.185
Such conversion can be done using df2vw()
function:
train_file_path <- file.path(tempdir(), "df_train.vw") test_file_path <- file.path(tempdir(), "df_test.vw") # For df_train df2vw(data = df_train, file_path = train_file_path, namespaces = list(a = list("sex") , b = list("diameter", "length", "height"), c = list("weight.w","weight.s","weight.v","weight.sh")), targets = "group") # And for df_test df2vw(data = df_test, file_path = test_file_path, namespaces = list(a = list("sex") , b = list("diameter", "length", "height"), c = list("weight.w","weight.s","weight.v","weight.sh")), targets = "group")
Arguments we use here:
data.frame
format.vw
file formatFirst we set up our Vowpal Wabbit model:
vwmodel <- vwsetup(general_params = list(random_seed=1), feature_params = list(quadratic="bc"), optimization_params = list(learning_rate=0.05, l1=1E-7), option = "ect", num_classes = 3)
Arguments we use here:
vwmodel
. Here we specify random_seed=1 - seed for random number generatorIt is also possible to specify parameters directly, like in CL version of Vowpal Wabbit:
vwsetup( params_str = "--random_seed 1 --quadratic bc --learning_rate 0.05 --l1 1E-7 --ect 3" )
This mode is generally not recomended, as it doesn't have parameters checking. Also, parameters from other vw functions can't be specified via "params_str".
Now we are ready to start training our model:
vwtrain(vwmodel = vwmodel, data = train_file_path, passes = 100)
#> Starting VW training session #> VW v8.6.1 #> Using data file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_train.vw #> Using model file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw #> Command line parameters: #> --random_seed 1 --quadratic bc --learning_rate 0.05 --l1 1e-07 --ect 3 -d /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_train.vw -f /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw --passes 100 --kill_cache --cache_file /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_train.vw.cache #> creating quadratic features for pairs: bc #> using l1 regularization = 1e-07 #> final_regressor = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw #> Num weight bits = 18 #> learning rate = 0.05 #> initial_t = 0 #> power_t = 0.5 #> decay_learning_rate = 1 #> creating cache_file = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_train.vw.cache #> Reading datafile = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_train.vw #> num sources = 1 #> average since example example current current current #> loss last counter weight label predict features #> 1.000000 1.000000 1 1.0 2 1 21 #> 1.000000 1.000000 2 2.0 1 2 21 #> 0.500000 0.000000 4 4.0 2 2 21 #> 0.500000 0.500000 8 8.0 2 2 21 #> 0.562500 0.625000 16 16.0 3 2 21 #> 0.531250 0.500000 32 32.0 3 2 21 #> 0.546875 0.562500 64 64.0 2 2 21 #> 0.500000 0.453125 128 128.0 1 2 21 #> 0.484375 0.468750 256 256.0 2 2 21 #> 0.476562 0.468750 512 512.0 2 2 21 #> 0.451172 0.425781 1024 1024.0 2 2 21 #> 0.417969 0.384766 2048 2048.0 1 2 21 #> 0.411894 0.411894 4096 4096.0 1 1 21 h #> 0.405941 0.400000 8192 8192.0 2 2 21 h #> #> finished run #> number of examples per pass = 3007 #> passes used = 5 #> weighted example sum = 15035.000000 #> weighted label sum = 0.000000 #> average loss = 0.386228 h #> total feature number = 315685
And finally compute predictions using trained model:
vw_pred <- predict(object = vwmodel, data = test_file_path)
#> Starting VW testing session #> VW v8.6.1 #> Using data file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw #> Using model file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw #> Command line parameters: #> -t -d /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw -i /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw --passes 1 -p /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw #> creating quadratic features for pairs: bc #> only testing #> predictions = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw #> Num weight bits = 18 #> learning rate = 0.5 #> initial_t = 0 #> power_t = 0.5 #> using no cache #> Reading datafile = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw #> num sources = 1 #> average since example example current current current #> loss last counter weight label predict features #> 1.000000 1.000000 1 1.0 2 1 21 #> 1.000000 1.000000 2 2.0 3 2 21 #> 1.000000 1.000000 4 4.0 3 2 21 #> 0.500000 0.000000 8 8.0 2 2 21 #> 0.562500 0.625000 16 16.0 2 2 21 #> 0.343750 0.125000 32 32.0 2 2 21 #> 0.437500 0.531250 64 64.0 1 1 21 #> 0.453125 0.468750 128 128.0 3 2 21 #> 0.378906 0.304688 256 256.0 1 2 21 #> 0.337891 0.296875 512 512.0 3 3 21 #> #> finished run #> number of examples = 836 #> weighted example sum = 836.000000 #> weighted label sum = 0.000000 #> average loss = 0.363636 #> total feature number = 17556
Arguments we use here:
character
(will be interpreted as a file path) or of type data.frame
(then df2vw()
will be used to convert data first)If we want to review parameters of our model and evaluation results we can simply print vwmodel
:
vwmodel
#> Vowpal Wabbit model #> Learning algorithm: sgd #> Working directory: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//Rtmp6bFZCA #> Model file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//Rtmp6bFZCA/vw_1534797504_mdl.vw #> General parameters: #> random_seed : 1 #> ring_size : Not defined #> holdout_off : FALSE #> holdout_period : 10 #> holdout_after : 0 #> early_terminate : 3 #> loss_function : squared #> link : identity #> quantile_tau : 0.5 #> Feature parameters: #> bit_precision : 18 #> quadratic : bc #> cubic : Not defined #> interactions : Not defined #> permutations : FALSE #> leave_duplicate_interactions : FALSE #> noconstant : FALSE #> feature_limit : Not defined #> ngram : Not defined #> skips : Not defined #> hash : Not defined #> affix : Not defined #> spelling : Not defined #> interact : Not defined #> Learning algorithms / Reductions: #> ect : #> num_classes : 3 #> Optimization parameters: #> adaptive : TRUE #> normalized : TRUE #> invariant : TRUE #> adax : FALSE #> sparse_l2 : 0 #> l1_state : 0 #> l2_state : 1 #> learning_rate : 0.05 #> initial_pass_length : Not defined #> l1 : 1e-07 #> l2 : 0 #> no_bias_regularization : Not defined #> feature_mask : Not defined #> decay_learning_rate : 1 #> initial_t : 0 #> power_t : 0.5 #> initial_weight : 0 #> random_weights : off #> normal_weights : off #> truncated_normal_weights : off #> sparse_weights : FALSE #> input_feature_regularizer : Not defined #> Model evaluation. Training: #> num_examples : 15035 #> weighted_example_sum : 15035 #> weighted_label_sum : 0 #> avg_loss : 0.3862275 #> total_feature : 315685 #> Model evaluation. Testing: #> num_examples : 836 #> weighted_example_sum : 836 #> weighted_label_sum : 0 #> avg_loss : 0.3636364 #> total_feature : 17556
For more in-depth analysis of our model we can inspect weights of the final regressor:
vwaudit(vwmodel = vwmodel)
#> Names Hashes Model.values #> 1 a^sex^M 58028 0.0363983 #> 2 b^diameter 57998 -0.0403499 #> 3 b^length 145012 -0.0496383 #> 4 b^height 107988 -0.0787311 #> 5 c^weight.w 90732 0.0901030 #> 6 c^weight.s 212300 0.1253380 #> 7 c^weight.v 65196 0.3800930 #> 8 c^weight.sh 76534 0.4210850 #> 9 Constant 232120 -0.1986420 #> 10 b^diameter*c^weight.w 116710 0.2348970 #> 11 b^diameter*c^weight.s 235718 0.4218860 #> 12 b^diameter*c^weight.v 23334 1.0003500 #> 13 b^diameter*c^weight.sh 102268 1.0187700 #> 14 b^length*c^weight.w 187120 0.1662080 #> 15 b^length*c^weight.s 34256 0.2956240 #> 16 b^length*c^weight.v 214576 0.7064920 #> 17 b^length*c^weight.sh 168554 0.7264760 #> 18 b^height*c^weight.w 93904 0.6733480 #> 19 b^height*c^weight.s 209392 1.1952400 #> 20 b^height*c^weight.v 61968 2.8984900 #> 21 b^height*c^weight.sh 75338 2.9625199 #> 22 a^sex^I 64526 -0.3009660 #> 23 a^sex^F 244246 0.0278317
Also it is possible to access and modify parameters of our model
# Show the learning rate of our model vwparams(vwmodel = vwmodel, name = "learning_rate")
#> [1] 0.05
# Change it to 0.1 vwparams(vwmodel = vwmodel, name = "learning_rate") <- 0.1 # And show again vwparams(vwmodel = vwmodel, name = "learning_rate")
#> [1] 0.1
We can add more learning algorithms to our model. For example we want to use boosting algorithm with 100 "weak" learners. Then we will just add this option to our model and train again:
vwmodel <- add_option(vwmodel, option = "boosting", num_learners=100) # We add *quiet = T* to hide output diagnostics. vwtrain(vwmodel = vwmodel, data = train_file_path, passes = 100, quiet = T) vw_pred <- predict(object = vwmodel, data = test_file_path)
#> Starting VW testing session #> VW v8.6.1 #> Using data file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw #> Using model file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw #> Command line parameters: #> -t -d /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw -i /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw --passes 1 -p /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw #> creating quadratic features for pairs: bc #> only testing #> predictions = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw #> Number of weak learners = 100 #> Gamma = 0.100000 #> Num weight bits = 18 #> learning rate = 0.5 #> initial_t = 0 #> power_t = 0.5 #> using no cache #> Reading datafile = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw #> num sources = 1 #> average since example example current current current #> loss last counter weight label predict features #> 1.000000 1.000000 1 1.0 2 1 21 #> 1.000000 1.000000 2 2.0 3 2 21 #> 1.000000 1.000000 4 4.0 3 2 21 #> 0.500000 0.000000 8 8.0 2 2 21 #> 0.500000 0.500000 16 16.0 2 2 21 #> 0.343750 0.187500 32 32.0 2 2 21 #> 0.406250 0.468750 64 64.0 1 1 21 #> 0.429688 0.453125 128 128.0 3 2 21 #> 0.375000 0.320312 256 256.0 1 2 21 #> 0.341797 0.308594 512 512.0 3 3 21 #> #> finished run #> number of examples = 836 #> weighted example sum = 836.000000 #> weighted label sum = 0.000000 #> average loss = 0.348086 #> total feature number = 17556
Vowpal Wabbit is famous for its the highly optimized Contextual Bandit algorithms and we can use them in rvw as well:
cb_model <- vwsetup(general_params = list(random_seed=1), feature_params = list(quadratic="bc"), option = "cbify", num_classes = 3) %>% add_option(option = "cb_explore", num_actions=3, explore_type="bag", explore_arg=7) vwtrain(vwmodel = cb_model, data = train_file_path, passes = 100, quiet = T) vw_pred <- predict(object = cb_model, data = test_file_path)
#> Starting VW testing session #> VW v8.6.1 #> Using data file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw #> Using model file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534798468_mdl.vw #> Command line parameters: #> -t -d /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw -i /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534798468_mdl.vw --passes 1 -p /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw #> creating quadratic features for pairs: bc #> only testing #> predictions = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw #> Num weight bits = 18 #> learning rate = 0.5 #> initial_t = 0 #> power_t = 0.5 #> using no cache #> Reading datafile = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw #> num sources = 1 #> average since example example current current current #> loss last counter weight label predict features #> 1.000000 1.000000 1 1.0 2 1 21 #> 0.500000 0.000000 2 2.0 3 3 21 #> 0.250000 0.000000 4 4.0 3 3 21 #> 0.500000 0.750000 8 8.0 2 3 21 #> 0.312500 0.125000 16 16.0 2 3 21 #> 0.500000 0.687500 32 32.0 2 3 21 #> 0.406250 0.312500 64 64.0 1 1 21 #> 0.335938 0.265625 128 128.0 3 3 21 #> 0.378906 0.421875 256 256.0 1 1 21 #> 0.453125 0.527344 512 512.0 3 3 21 #> #> finished run #> number of examples = 836 #> weighted example sum = 836.000000 #> weighted label sum = 0.000000 #> average loss = 0.470096 #> total feature number = 17556
New arguments we use here:
We can see that, unsurprisingly, Contextual Bandit model performs worser than multiclass classification model under full information in terms of the average testing error:
r vwmodel$eval$test$avg_loss
for multiclass classification model
r cb_model$eval$test$avg_loss
for Contextual Bandit model
Development of rvw package started as R Vowpal Wabbit (Google Summer of Code 2018) project. with Dirk Eddelbuettel and James J Balamuta mentoring this project and the R Project for Statistical Computing as the mentor organization.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.