knitr::opts_chunk$set(
  collapse = FALSE,
    comment = "#>"
)

Introduction

Vowpal Wabbit is an online machine learning system that is known for its speed and scalability and is widely used in research and industry.

This package aims to bring its functionality to R.

Installation

First you have to get Vowpal Wabbit from here.

And then install the rvw package using devtools:

library(devtools)
install_github("rvw-org/rvw")

Using rvw

rvw package gives you an access to various learning algorithms from Vowpal Wabbit. In this tutorial, you will see how you can use rvw for a multiclass classification problem.

Data preparation

Here we will try to predict the age group of abalone (based on number of abalone shell rings) from physical measurements. We will use Abalone Data Set from UCI Machine Learning Repository.

library("rvw")
library("mltools")  # We need "mltools" for data preparation
library("magrittr") # We use "magrittr" for its "pipe" operator

set.seed(1)
data_url = 'http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data'
data_names = c('sex','length','diameter','height','weight.w','weight.s','weight.v','weight.sh','rings')
data_full = read.table(data_url, header = F , sep = ',', col.names = data_names)

# Split number of rings into groups with equal (as possible) number of observations
classes = 3 # Split into 3 groups
data_full$group <- bin_data(data_full$rings, bins=classes, binType = "quantile")
group_lvls <- levels(data_full$group)
levels(data_full$group) <- seq_len(classes)

# Prepare indices to split data
ind_train <- sample(1:nrow(data_full), 0.8*nrow(data_full))
# Split data into train and test subsets
df_train <- data_full[ind_train,]
df_test <- data_full[-ind_train,]

Vowpal Wabbit input format

In order to use Vowpal Wabbit we have to convert our data from data.frame format to .vw plain text format.

Each example should be formatted as follows:

[Label] [Importance] [Base] [Tag]|Namespace Features |Namespace Features ... |Namespace Features

One row from df_train:

df_train[1,]
#>      sex length diameter height weight.w weight.s weight.v weight.sh rings group
#> 1110   M   0.52      0.4  0.145   0.7765   0.3525   0.1845     0.185     9     2     

will be converted to:

#> 2 |a sex^M |b diameter:0.4 length:0.52 height:0.145 |c weight.w:0.7765 weight.s:0.3525 weight.v:0.1845 weight.sh:0.185

Such conversion can be done using df2vw() function:

train_file_path <- file.path(tempdir(), "df_train.vw")
test_file_path <- file.path(tempdir(), "df_test.vw")
# For df_train
df2vw(data = df_train, file_path = train_file_path,
      namespaces = list(a = list("sex") ,
                        b = list("diameter", "length", "height"),
                        c = list("weight.w","weight.s","weight.v","weight.sh")),
      targets = "group")
# And for df_test
df2vw(data = df_test, file_path = test_file_path,
      namespaces = list(a = list("sex") ,
                        b = list("diameter", "length", "height"),
                        c = list("weight.w","weight.s","weight.v","weight.sh")),
      targets = "group")

Arguments we use here:

Basic usage

First we set up our Vowpal Wabbit model:

vwmodel <- vwsetup(general_params = list(random_seed=1),
                   feature_params = list(quadratic="bc"),
                   optimization_params = list(learning_rate=0.05, l1=1E-7),
                   option = "ect", num_classes = 3)

Arguments we use here:


It is also possible to specify parameters directly, like in CL version of Vowpal Wabbit:

vwsetup(
    params_str = "--random_seed 1 --quadratic bc --learning_rate 0.05 --l1 1E-7 --ect 3"
)

This mode is generally not recomended, as it doesn't have parameters checking. Also, parameters from other vw functions can't be specified via "params_str".


Now we are ready to start training our model:

vwtrain(vwmodel = vwmodel, data = train_file_path, passes = 100)
#> Starting VW training session
#> VW v8.6.1
#> Using data file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_train.vw
#> Using model file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw
#> Command line parameters:
#> --random_seed 1 --quadratic bc --learning_rate 0.05 --l1 1e-07 --ect 3 -d /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_train.vw -f /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw --passes 100 --kill_cache --cache_file /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_train.vw.cache
#> creating quadratic features for pairs: bc
#> using l1 regularization = 1e-07
#> final_regressor = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw
#> Num weight bits = 18
#> learning rate = 0.05
#> initial_t = 0
#> power_t = 0.5
#> decay_learning_rate = 1
#> creating cache_file = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_train.vw.cache
#> Reading datafile = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_train.vw
#> num sources = 1
#> average  since         example        example  current  current  current
#> loss     last          counter         weight    label  predict features
#> 1.000000 1.000000            1            1.0        2        1       21
#> 1.000000 1.000000            2            2.0        1        2       21
#> 0.500000 0.000000            4            4.0        2        2       21
#> 0.500000 0.500000            8            8.0        2        2       21
#> 0.562500 0.625000           16           16.0        3        2       21
#> 0.531250 0.500000           32           32.0        3        2       21
#> 0.546875 0.562500           64           64.0        2        2       21
#> 0.500000 0.453125          128          128.0        1        2       21
#> 0.484375 0.468750          256          256.0        2        2       21
#> 0.476562 0.468750          512          512.0        2        2       21
#> 0.451172 0.425781         1024         1024.0        2        2       21
#> 0.417969 0.384766         2048         2048.0        1        2       21
#> 0.411894 0.411894         4096         4096.0        1        1       21 h
#> 0.405941 0.400000         8192         8192.0        2        2       21 h
#> 
#> finished run
#> number of examples per pass = 3007
#> passes used = 5
#> weighted example sum = 15035.000000
#> weighted label sum = 0.000000
#> average loss = 0.386228 h
#> total feature number = 315685

And finally compute predictions using trained model:

vw_pred <- predict(object = vwmodel, data = test_file_path)
#> Starting VW testing session
#> VW v8.6.1
#> Using data file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw
#> Using model file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw
#> Command line parameters: 
#>  -t -d /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw -i /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw --passes 1 -p /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw
#> creating quadratic features for pairs: bc 
#> only testing
#> predictions = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw
#> Num weight bits = 18
#> learning rate = 0.5
#> initial_t = 0
#> power_t = 0.5
#> using no cache
#> Reading datafile = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw
#> num sources = 1
#> average  since         example        example  current  current  current
#> loss     last          counter         weight    label  predict features
#> 1.000000 1.000000            1            1.0        2        1       21
#> 1.000000 1.000000            2            2.0        3        2       21
#> 1.000000 1.000000            4            4.0        3        2       21
#> 0.500000 0.000000            8            8.0        2        2       21
#> 0.562500 0.625000           16           16.0        2        2       21
#> 0.343750 0.125000           32           32.0        2        2       21
#> 0.437500 0.531250           64           64.0        1        1       21
#> 0.453125 0.468750          128          128.0        3        2       21
#> 0.378906 0.304688          256          256.0        1        2       21
#> 0.337891 0.296875          512          512.0        3        3       21
#> 
#> finished run
#> number of examples = 836
#> weighted example sum = 836.000000
#> weighted label sum = 0.000000
#> average loss = 0.363636
#> total feature number = 17556

Arguments we use here:

Accessing the results

If we want to review parameters of our model and evaluation results we can simply print vwmodel:

vwmodel
#>  Vowpal Wabbit model
#> Learning algorithm:   sgd 
#> Working directory:   /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//Rtmp6bFZCA 
#> Model file:   /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//Rtmp6bFZCA/vw_1534797504_mdl.vw 
#> General parameters: 
#>   random_seed :   1 
#>   ring_size :  Not defined
#>   holdout_off :   FALSE 
#>   holdout_period :   10 
#>   holdout_after :   0 
#>   early_terminate :   3 
#>   loss_function :   squared 
#>   link :   identity 
#>   quantile_tau :   0.5 
#> Feature parameters: 
#>   bit_precision :   18 
#>   quadratic :   bc 
#>   cubic :  Not defined
#>   interactions :  Not defined
#>   permutations :   FALSE 
#>   leave_duplicate_interactions :   FALSE 
#>   noconstant :   FALSE 
#>   feature_limit :  Not defined
#>   ngram :  Not defined
#>   skips :  Not defined
#>   hash :  Not defined
#>   affix :  Not defined
#>   spelling :  Not defined
#>   interact :  Not defined
#> Learning algorithms / Reductions: 
#>   ect :
#>       num_classes :   3 
#> Optimization parameters: 
#>   adaptive :   TRUE 
#>   normalized :   TRUE 
#>   invariant :   TRUE 
#>   adax :   FALSE 
#>   sparse_l2 :   0 
#>   l1_state :   0 
#>   l2_state :   1 
#>   learning_rate :   0.05 
#>   initial_pass_length :  Not defined
#>   l1 :   1e-07 
#>   l2 :   0 
#>   no_bias_regularization :  Not defined
#>   feature_mask :  Not defined
#>   decay_learning_rate :   1 
#>   initial_t :   0 
#>   power_t :   0.5 
#>   initial_weight :   0 
#>   random_weights :  off
#>   normal_weights :  off
#>   truncated_normal_weights :  off
#>   sparse_weights :   FALSE 
#>   input_feature_regularizer :  Not defined
#> Model evaluation. Training: 
#>   num_examples :   15035 
#>   weighted_example_sum :   15035 
#>   weighted_label_sum :   0 
#>   avg_loss :   0.3862275 
#>   total_feature :   315685 
#> Model evaluation. Testing: 
#>   num_examples :   836 
#>   weighted_example_sum :   836 
#>   weighted_label_sum :   0 
#>   avg_loss :   0.3636364 
#>   total_feature :   17556

For more in-depth analysis of our model we can inspect weights of the final regressor:

vwaudit(vwmodel = vwmodel)
#>                     Names Hashes Model.values
#> 1                 a^sex^M  58028    0.0363983
#> 2              b^diameter  57998   -0.0403499
#> 3                b^length 145012   -0.0496383
#> 4                b^height 107988   -0.0787311
#> 5              c^weight.w  90732    0.0901030
#> 6              c^weight.s 212300    0.1253380
#> 7              c^weight.v  65196    0.3800930
#> 8             c^weight.sh  76534    0.4210850
#> 9                Constant 232120   -0.1986420
#> 10  b^diameter*c^weight.w 116710    0.2348970
#> 11  b^diameter*c^weight.s 235718    0.4218860
#> 12  b^diameter*c^weight.v  23334    1.0003500
#> 13 b^diameter*c^weight.sh 102268    1.0187700
#> 14    b^length*c^weight.w 187120    0.1662080
#> 15    b^length*c^weight.s  34256    0.2956240
#> 16    b^length*c^weight.v 214576    0.7064920
#> 17   b^length*c^weight.sh 168554    0.7264760
#> 18    b^height*c^weight.w  93904    0.6733480
#> 19    b^height*c^weight.s 209392    1.1952400
#> 20    b^height*c^weight.v  61968    2.8984900
#> 21   b^height*c^weight.sh  75338    2.9625199
#> 22                a^sex^I  64526   -0.3009660
#> 23                a^sex^F 244246    0.0278317

Also it is possible to access and modify parameters of our model

# Show the learning rate of our model
vwparams(vwmodel = vwmodel, name = "learning_rate")
#> [1] 0.05
# Change it to 0.1
vwparams(vwmodel = vwmodel, name = "learning_rate") <- 0.1
# And show again
vwparams(vwmodel = vwmodel, name = "learning_rate")
#> [1] 0.1

Extending our model

We can add more learning algorithms to our model. For example we want to use boosting algorithm with 100 "weak" learners. Then we will just add this option to our model and train again:

vwmodel <- add_option(vwmodel, option = "boosting", num_learners=100)
# We add *quiet = T* to hide output diagnostics.
vwtrain(vwmodel = vwmodel, data = train_file_path, passes = 100, quiet = T)
vw_pred <- predict(object = vwmodel, data = test_file_path)
#> Starting VW testing session
#> VW v8.6.1
#> Using data file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw
#> Using model file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw
#> Command line parameters: 
#>  -t -d /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw -i /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534797886_mdl.vw --passes 1 -p /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw
#> creating quadratic features for pairs: bc 
#> only testing
#> predictions = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw
#> Number of weak learners = 100
#> Gamma = 0.100000
#> Num weight bits = 18
#> learning rate = 0.5
#> initial_t = 0
#> power_t = 0.5
#> using no cache
#> Reading datafile = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw
#> num sources = 1
#> average  since         example        example  current  current  current
#> loss     last          counter         weight    label  predict features
#> 1.000000 1.000000            1            1.0        2        1       21
#> 1.000000 1.000000            2            2.0        3        2       21
#> 1.000000 1.000000            4            4.0        3        2       21
#> 0.500000 0.000000            8            8.0        2        2       21
#> 0.500000 0.500000           16           16.0        2        2       21
#> 0.343750 0.187500           32           32.0        2        2       21
#> 0.406250 0.468750           64           64.0        1        1       21
#> 0.429688 0.453125          128          128.0        3        2       21
#> 0.375000 0.320312          256          256.0        1        2       21
#> 0.341797 0.308594          512          512.0        3        3       21
#> 
#> finished run
#> number of examples = 836
#> weighted example sum = 836.000000
#> weighted label sum = 0.000000
#> average loss = 0.348086
#> total feature number = 17556

Contextual Bandit algorithms in Vowpal Wabbit

Vowpal Wabbit is famous for its the highly optimized Contextual Bandit algorithms and we can use them in rvw as well:

cb_model <- vwsetup(general_params = list(random_seed=1),
                    feature_params = list(quadratic="bc"),
                    option = "cbify", num_classes = 3) %>%
    add_option(option = "cb_explore",
               num_actions=3, explore_type="bag", explore_arg=7)

vwtrain(vwmodel = cb_model, data = train_file_path, passes = 100, quiet = T)
vw_pred <- predict(object = cb_model, data = test_file_path)
#> Starting VW testing session
#> VW v8.6.1
#> Using data file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw
#> Using model file: /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534798468_mdl.vw
#> Command line parameters: 
#>  -t -d /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw -i /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/vw_1534798468_mdl.vw --passes 1 -p /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw
#> creating quadratic features for pairs: bc 
#> only testing
#> predictions = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/temp_probs_out.vw
#> Num weight bits = 18
#> learning rate = 0.5
#> initial_t = 0
#> power_t = 0.5
#> using no cache
#> Reading datafile = /var/folders/yx/6949djdd3yb4qsw7x_95wfjr0000gn/T//RtmpVQMLUo/df_test.vw
#> num sources = 1
#> average  since         example        example  current  current  current
#> loss     last          counter         weight    label  predict features
#> 1.000000 1.000000            1            1.0        2        1       21
#> 0.500000 0.000000            2            2.0        3        3       21
#> 0.250000 0.000000            4            4.0        3        3       21
#> 0.500000 0.750000            8            8.0        2        3       21
#> 0.312500 0.125000           16           16.0        2        3       21
#> 0.500000 0.687500           32           32.0        2        3       21
#> 0.406250 0.312500           64           64.0        1        1       21
#> 0.335938 0.265625          128          128.0        3        3       21
#> 0.378906 0.421875          256          256.0        1        1       21
#> 0.453125 0.527344          512          512.0        3        3       21
#> 
#> finished run
#> number of examples = 836
#> weighted example sum = 836.000000
#> weighted label sum = 0.000000
#> average loss = 0.470096
#> total feature number = 17556

New arguments we use here:

We can see that, unsurprisingly, Contextual Bandit model performs worser than multiclass classification model under full information in terms of the average testing error:

r vwmodel$eval$test$avg_loss for multiclass classification model
r cb_model$eval$test$avg_loss for Contextual Bandit model

Acknowledgements

Development of rvw package started as R Vowpal Wabbit (Google Summer of Code 2018) project. with Dirk Eddelbuettel and James J Balamuta mentoring this project and the R Project for Statistical Computing as the mentor organization.



ivan-pavlov/rvwgsoc documentation built on July 1, 2019, 9:40 p.m.