Description Usage Arguments Examples
This function is fairly simple and extensible to other problems, so far just supports binary classification. Thought to be used in conjuction to perf in order to compute validation metrics on left out datasets. See osmot.cs.cornell.edu/kddcup/software.html for more info about perf.
1 2 3 4 5 6 7 8 9 | vw(training_data, validation_data, model = "mdl.vw",
path_vw_data_train = NULL, path_vw_data_val = NULL, target = NULL,
namespaces = NULL, weight = NULL, tag = NULL, out_probs = NULL,
validation_labels = NULL, loss = "logistic", b = 25,
learning_rate = 0.5, passes = 1, l1 = NULL, l2 = NULL,
early_terminate = NULL, link_function = "--link=logistic", extra = NULL,
keep_preds = TRUE, do_evaluation = TRUE, use_perf = TRUE,
plot_roc = TRUE, verbose = TRUE, keep_tempfiles = FALSE,
use_cache = TRUE)
|
training_data |
a [data.frame] or path to a vw data file |
validation_data |
a [data.frame] or path to a vw data file |
model |
name of the model file |
path_vw_data_train |
if training_data is a [data.frame], the path to which to save the vw data file. If NULL, the data is stored in a temporary folder and deleted before exiting the function |
path_vw_data_val |
if validation_data is a [data.frame], the path to which to save the vw data file. If NULL, the data is stored in a temporary folder and deleted before exiting the function |
target |
if training_data or validation_data is a [data.frame], the name of the variable in the [data.frame] corresponding to the target variable |
namespaces |
used only if training_data or validation_data is a [data.frame]. See arguments of dt2vw |
weight |
used only if training_data or validation_data is a [data.frame]. See arguments of dt2vw |
tag |
used only if training_data or validation_data is a [data.frame]. See arguments of dt2vw |
out_probs |
path to file where to save the predictions. If NULL, the file is stored in a temporary file then deleted. |
validation_labels |
file to look for validation data true labels - to compute auc using perf or roc_auc() from the R package pROC. If the validation data is a [data.frame] and validation_labels is NULL, the validation labels file is deleted before exiting the function. If validation_labels is not NULL, it indicates the path where validation labels should be stored. |
loss |
loss function. By default logistic. |
b |
number of bits for the weight vector allocation |
learning_rate |
sets the learning rate, default is 0.5 |
passes |
sets the number of passes over the data, default 1 |
l1 |
l1 regularization |
l2 |
l2 regularization |
early_terminate |
specifies the number of passes tolerated when holdout loss does not decrease before early termination, |
link_function |
used to generate predictions |
extra |
These is where more VW commands can be passed as text |
keep_preds |
TRUE (default) to return a vector of the predictions |
do_evaluation |
TRUE to compute auc on validation_data. Use FALSE, to just score data |
use_perf |
use perf to compute auc. Otherwise, auc_roc() from the R package pROC is used. |
plot_roc |
[bool] should ROC be plotted |
verbose |
mostly used to debug but shows AUC and the vw command used to train the model |
keep_tempfiles |
[bool] should temporary files be kept, default FALSE |
use_cache |
[bool] should cache files be used, default TRUE |
1 2 3 4 5 6 7 8 9 10 11 12 | # 1. Create a training set (training_data) and validation set (validation_data) in vw format.
# 2. Install perf
# 3. Create a vector of true labels for the validation dataset, in the [0, 1] range. This is what perf likes.
# 4. Run one model with the present code
## Not run:
auc = vw(training_data='X_train.vw', validation_data='X_valid.vw',
loss='logistic', model='mdl.vw', b=25, learning_rate=0.5,
passes=20, l1=1e-08, l2=1e-08, early_terminate=2,
interactions=NULL, extra='--stage_poly')
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.