vw: Trains Vowpal Wabbit models from R.

Description Usage Arguments Examples

View source: R/vw.R

Description

This function is fairly simple and extensible to other problems, so far just supports binary classification. Thought to be used in conjuction to perf in order to compute validation metrics on left out datasets. See osmot.cs.cornell.edu/kddcup/software.html for more info about perf.

Usage

1
2
3
4
5
6
7
8
9
vw(training_data, validation_data, model = "mdl.vw",
  path_vw_data_train = NULL, path_vw_data_val = NULL, target = NULL,
  namespaces = NULL, weight = NULL, tag = NULL, out_probs = NULL,
  validation_labels = NULL, loss = "logistic", b = 25,
  learning_rate = 0.5, passes = 1, l1 = NULL, l2 = NULL,
  early_terminate = NULL, link_function = "--link=logistic", extra = NULL,
  keep_preds = TRUE, do_evaluation = TRUE, use_perf = TRUE,
  plot_roc = TRUE, verbose = TRUE, keep_tempfiles = FALSE,
  use_cache = TRUE)

Arguments

training_data

a [data.frame] or path to a vw data file

validation_data

a [data.frame] or path to a vw data file

model

name of the model file

path_vw_data_train

if training_data is a [data.frame], the path to which to save the vw data file. If NULL, the data is stored in a temporary folder and deleted before exiting the function

path_vw_data_val

if validation_data is a [data.frame], the path to which to save the vw data file. If NULL, the data is stored in a temporary folder and deleted before exiting the function

target

if training_data or validation_data is a [data.frame], the name of the variable in the [data.frame] corresponding to the target variable

namespaces

used only if training_data or validation_data is a [data.frame]. See arguments of dt2vw

weight

used only if training_data or validation_data is a [data.frame]. See arguments of dt2vw

tag

used only if training_data or validation_data is a [data.frame]. See arguments of dt2vw

out_probs

path to file where to save the predictions. If NULL, the file is stored in a temporary file then deleted.

validation_labels

file to look for validation data true labels - to compute auc using perf or roc_auc() from the R package pROC. If the validation data is a [data.frame] and validation_labels is NULL, the validation labels file is deleted before exiting the function. If validation_labels is not NULL, it indicates the path where validation labels should be stored.

loss

loss function. By default logistic.

b

number of bits for the weight vector allocation

learning_rate

sets the learning rate, default is 0.5

passes

sets the number of passes over the data, default 1

l1

l1 regularization

l2

l2 regularization

early_terminate

specifies the number of passes tolerated when holdout loss does not decrease before early termination,

link_function

used to generate predictions

extra

These is where more VW commands can be passed as text

keep_preds

TRUE (default) to return a vector of the predictions

do_evaluation

TRUE to compute auc on validation_data. Use FALSE, to just score data

use_perf

use perf to compute auc. Otherwise, auc_roc() from the R package pROC is used.

plot_roc

[bool] should ROC be plotted

verbose

mostly used to debug but shows AUC and the vw command used to train the model

keep_tempfiles

[bool] should temporary files be kept, default FALSE

use_cache

[bool] should cache files be used, default TRUE

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# 1. Create a training set (training_data) and validation set (validation_data) in vw format.
# 2. Install perf
# 3. Create a vector of true labels for the validation dataset, in the [0, 1] range. This is what perf likes.
# 4. Run one model with the present code

## Not run: 
auc = vw(training_data='X_train.vw', validation_data='X_valid.vw',
        loss='logistic', model='mdl.vw', b=25, learning_rate=0.5,
        passes=20, l1=1e-08, l2=1e-08, early_terminate=2,
        interactions=NULL, extra='--stage_poly')

## End(Not run)

rvw-org/rvw-legacy documentation built on May 5, 2019, 6:56 p.m.