vwsetup: Create Vowpal Wabbit model, setup model parameters and data

Description Usage Arguments Value Examples

Description

Sets up VW model together with parameters and data

Usage

1
2
3
4
5
6
7
8
9
vwsetup(algorithm = c("sgd", "bfgs", "ftrl", "pistol", "ksvm",
  "OjaNewton", "svrg"), general_params = list(),
  feature_params = list(), optimization_params = list(),
  dir = tempdir(), model = NULL, params_str = NULL, option = c("",
  "binary", "oaa", "ect", "csoaa", "wap", "log_multi", "recall_tree",
  "lda", "multilabel_oaa", "classweight", "new_mf", "lrq", "stage_poly",
  "bootstrap", "autolink", "replay", "explore_eval", "cb", "cb_explore",
  "cbify", "multiworld_test_check", "nn", "topk", "search", "boosting",
  "marginal"), ...)

Arguments

algorithm

[string] Optimzation algorithm

  • sgd - adaptive, normalized, invariant stochastic gradient descent

  • bfgs - Limited-memory Broyden-Fletcher-Goldfarb-Shanno optimization algorithm

  • ftrl - FTRL: Follow the Regularized Leader optimization algorithm

  • pistol - FTRL: Parameter-free Stochastic Learning

  • ksvm - Kernel svm

  • OjaNewton - Online Newton with Oja's Sketch

  • svrg - Stochastic Variance Reduced Gradient

general_params

List of parameters:

  • random_seed [int] - Seed random number generator (default: 0)

  • ring_size [int] - Size of example ring

  • holdout_off [bool] - No holdout data in multiple passes (default: FALSE)

  • holdout_period [int] - Holdout period for test only (default: 10)

  • holdout_after [int] - Holdout after n training examples, default off (disables holdout_period) (default: 0)

  • early_terminate [int] - Specify the number of passes tolerated when holdout loss doesn't decrease before early termination (default: 3)

  • loss_function [string] - Specify the loss function to be used, uses squared by default. Currently available ones are: squared, classic, hinge, logistic, quantile and poisson. (default: squared)

  • link [string] - Specify the link function: identity, logistic, glf1 or poisson. (default: identity)

  • quantile_tau [real] - Parameter "tau" associated with Quantileloss. (default: 0.5)

feature_params

List of parameters: More information about "interactions" option (also "quadratic", "cubic") avaliable here https://github.com/VowpalWabbit/vowpal_wabbit/wiki/Command-line-arguments#example-manipulation-options

  • bit_precision [int] - Number of bits in the feature table (default: 18)

  • quadratic [string] - Create and use quadratic features (Specify 2 namespaces)

  • cubic [string] - Create and use cubic features (Specify 3 namespaces)

  • interactions [string] - Create feature interactions of any level between namespaces (Specify several namespaces)

  • permutations [bool] - Use permutations instead of combinations for feature interactions of same namespace (default: FALSE)

  • leave_duplicate_interactions [bool] - Don't remove interactions with duplicate combinations of namespaces. For ex. this is a duplicate: 'quadratic="ab", quadratic="ba"' and a lot more in 'quadratic="::"'. (default: FALSE)

  • noconstant [bool] - Don't add a constant feature (default: FALSE)

  • feature_limit [string] - limit to N features. To apply to a single namespace 'foo', arg should be "fN"

  • ngram [string] - Generate N grams. To generate N grams for a single namespace 'foo', arg should be "fN".

  • skips [string] - Use second derivative in line searchGenerate skips in N grams. This in conjunction with the ngram tag can be used to generate generalized n-skip-k-gram. To generate n-skips for a single namespace 'foo', arg should be "fN".

  • hash [string] - How to hash the features. Available options: "strings", "all" (default: "strings")

  • affix [string] - Generate prefixes/suffixes of features; argument "+2a,-3b,+1" means generate 2-char prefixes for namespace a, 3-char suffixes for b and 1 char prefixes for default namespace

  • spelling [string] - Compute spelling features for a given namespace (use '_' for default namespace)

  • interact [string] - Put weights on feature products from namespaces <n1> and <n2>

optimization_params

List of parameters:

  • learning_rate [real] - Set initial learning Rate (default: 0.5)

  • initial_pass_length [int] - Initial number of examples per pass

  • l1 [real] - L1 regularization (default: 0)

  • l2 [real] - L2 regularization (default: 0)

  • no_bias_regularization [string] - no bias in regularization (Available options: "on", "off")

  • feature_mask [string] - Use existing regressor to determine which parameters may be updated. If no initial_regressor given, also used for initial weights.

  • decay_learning_rate [real] - Set Decay factor for learning_rate between passes (default: 1)

  • initial_t [real] - initial t value (default: 0)

  • power_t [real] - t power value (default: 0.5)

  • initial_weight [int] - Set all weights to an initial value of arg (default: 0)

  • random_weights [string] - Make initial weights random (Available options: "on", "off") (default: "off")

  • normal_weights [string] - Make initial weights normal (Available options: "on", "off") (default: "off")

  • truncated_normal_weights [string] - Make initial weights truncated normal (Available options: "on", "off") (default: "off")

  • sparse_weights [bool] - Use a sparse datastructure for weights.

  • input_feature_regularizer [string] - Per feature regularization input file.

Additional parameters depending on algorithm choice:

  • sgd:

    • adaptive [bool] - Use adaptive, individual learning rates (default: TRUE)

    • normalized [bool] - Use per feature normalized updates (default: TRUE)

    • invariant [bool] - Use safe/importance aware updates (default: TRUE)

    • adax [bool] - Use adaptive learning rates with x^2 instead of g^2x^2 (default: FALSE)

    • sparse_l2 [real] - use per feature normalized updates (default: 0)

    • l1_state [real] - use per feature normalized updates (default: 0)

    • l2_state [real] - use per feature normalized updates (default: 1)

  • bfgs:

    • conjugate_gradient [bool] - Use conjugate gradient based optimization (default: FALSE)

    • hessian_on [bool] - Use second derivative in line search (default: FALSE)

    • mem [int] - Memory in bfgs. (default: 15)

    • termination [real] - Termination threshold. (default: 0.00100000005)

  • ftrl:

    • ftrl_alpha [real] - Learning rate for FTRL optimization (default: 0.005)

    • ftrl_beta [real] - FTRL beta parameter (default: 0.1)

  • pistol:

    • ftrl_alpha [real] - Learning rate for FTRL optimization (default: 0.005)

    • ftrl_beta [real] - FTRL beta parameter (default: 0.1)

  • ksvm:

    • reprocess [int] - number of reprocess steps for LASVM (default: 1)

    • kernel [string] - type of kernel (rbf or linear) (default: "linear")

    • bandwidth [real] - bandwidth of rbf kernel (default: 1.0)

    • degree [int] - degree of poly kernel (default: 2)

    • lambda [real] - saving regularization for test time (default: -1)

  • OjaNewton:

    • sketch_size [int] - size of sketch (default: 10)

    • epoch_size [int] - size of epoch (default: 1)

    • alpha [real] - multiplicative constant for identity (default: 1)

    • alpha_inverse [real] - one over alpha, similar to learning rate

    • learning_rate_cnt - constant for the learning rate 1/t (default: 2)

    • normalize [string] - normalize the features or not (Available options: "on", "off") (default: "on")

    • random_init [string] - randomize initialization of Oja or not (Available options: "on", "off") (default: "on")

  • svrg:

    • stage_size [int] - Number of passes per SVRG stage (default: 1)

dir

[string] Working directory path, default is tempdir()

model

[string] File name for model weights or path to existng model file.

params_str

[string] Pass cmd line parameters directly, bypassing the default approach. For compatibility, parameters from vwtrain,vwtest, predict.vw can't be used here and functions add_option, vwparams aren't supported.

option

[string] Add Learning algorithm / reduction option:

  • binary - Reports loss as binary classification with -1,1 labels

  • oaa - One-against-all multiclass learning with labels

  • ect - Error correcting tournament with labels

  • csoaa - One-against-all multiclass learning with costs

  • wap - Weighted all-pairs multiclass learning with costs

  • multilabel_oaa - One-against-all multilabel with multiple labels

  • log_multi - Online (decision) trees for classes

  • classweight - Importance weight classes

  • lda - Latent Dirichlet Allocation

  • recall_tree - Use online tree for multiclass

  • new_mf - Matrix factorization mode

  • lrq - Low rank quadratic features

  • stage_poly - Stagewise polynomial features

  • bootstrap - bootstrap with K rounds by online importance resampling

  • autolink - Create link function with polynomial N

  • replay - Experience Replay

  • explore_eval - Explore evaluation

  • cb - Contextual bandit learning

  • cb_explore - Contextual Bandit Exploration

  • cbify - Convert multiclass on K classes into a contextual bandit problem

  • multiworld_test - Multiworld Testing

  • nn - Sigmoidal feedforward network

  • topk - Top K recommendation

  • struct_search - Search-based structured prediction (SEARN or DAgger)

  • boosting - Online boosting with weak learners

  • marginal - Substitute marginal label estimates for ids

...

Additional options for a learning algorithm / reduction

  • oaa or ect:

    • num_classes [int] - Number of classes

    • oaa_subsample [int] - Subsample this number of negative examples when learning

  • multilabel_oaa:

    • num_labels [int] - Number of labels

  • csoaa or wap:

    • num_classes [int] - Number of classes

    • csoaa_ldf or wap_ldf - singleline (Default) or multiline label dependent features

  • log_multi:

    • num_classes [int] - Number of classes

    • no_progress [bool] - Disable progressive validation (default: FALSE)

    • swap_resistance [int] - Higher = more resistance to swap, (default: 4)

  • classweight:

    • class_multiplier [real] - importance weight multiplier for class

  • recall_tree:

    • num_classes [int] - Number of classes

    • max_candidates [int] - Maximum number of labels per leaf in the tree

    • bern_hyper [real] - Recall tree depth penalty (default: 1)

    • max_depth [int] - Maximum depth of the tree, (default: log_2(number of classes) )

    • node_only [string] - Only use node features, not full path (Available options: "on", "off") (default: "off")

    • randomized_routing [string] - Randomized routing (Available options: "on", "off") (default: "off")

  • lda:

    • num_topics [int] - Number of topics

    • lda_alpha [real] - Prior on sparsity of per-document topic weights (default: 0.100000001)

    • lda_rho [real] - Prior on sparsity of topic distributions (default: 0.100000001)

    • lda_D [int] - Number of documents (default: 10000)

    • lda_epsilon [real] - Loop convergence threshold (default: 0.00100000005)

    • math-mode [string] - Math mode: simd, accuracy, fast-approx

    • minibatch [int] - Minibatch size (default: 1)

    • metrics [string] - Compute metrics (Available options: "on", "off") (default: "off")

  • new_mf:

    • rank [int] - rank for matrix factorization

  • lrq:

    • features [string] - low rank quadratic features

    • lrqdropout [bool] - use dropout training for low rank quadratic features (default: FALSE)

  • stage_poly:

    • sched_exponent [real] - exponent controlling quantity of included features (default: 1.0)

    • batch_sz [int] - multiplier on batch size before including more features (default: 1000)

    • batch_sz_no_doubling [bool] - batch_sz does not double (default: TRUE)

  • bootstrap:

    • num_rounds [int] - number of rounds

    • bs_type [string] - the bootstrap mode: 'mean' or 'vote' (default: "mean")

  • autolink:

    • degree [int] - polynomial degree (default: 2)

  • replay:

    • level [string] - Use experience replay at a specified level (b=classification/regression, m=multiclass, c=cost sensitive)

    • buffer [int] - Buffer size (default: 100)

    • count [int] - how many times (in expectation) should each example be played (default: 1 = permuting)

  • explore_eval:

    • multiplier [real] - Multiplier used to make all rejection sample probabilities <= 1

  • cb:

    • num_costs [int] - number of num_costs If costs=0, contextual bandit learning with multiline action dependent features (ADF) is triggered ("–cb_adf").

    • cb_type [string] - contextual bandit method to use in ips,dm,dr, mtr (for ADF) (default: "dr")

    • eval [bool] - Evaluate a policy rather than optimizing (default: FALSE)

    • rank_all [bool] - Return actions sorted by score order. (for ADF) (default: FALSE)

    • no_predict [bool] - Do not do a prediction when training. (for ADF) (default: FALSE)

  • cb_explore:

    • num_actions [bool] - number of actions in online explore-exploit for a <k> action contextual bandit problem. If num_actions=0, online explore-exploit for a contextual bandit problem with multiline action dependent features (ADF) is triggered ("–cb_explore_adf").

    • explore_type [string] - Type of exploration to use: "epsilon" (epsilon-greedy exploration) (default), "first" (tau-first exploration), "bag" (bagging-based exploration), "cover" (Online cover based exploration), "softmax" (softmax exploration), "regcb" (RegCB-elim exploration), "regcbopt" (RegCB optimistic exploration). "softmax", "regcb" and "regcbopt" types are only avaliable for exploration with ADF. (default: "epsilon")

    • explore_arg [real] - Parameter for exploration algorithm. Applicable for "epsilon", "first", "bag" and "cover" types of exploration. (default: 0.05)

    • psi [real] - Disagreement parameter for "cover" algorithm. (default: 1)

    • nounif [bool] - Do not explore uniformly on zero-probability actions in "cover" algorithm. (default: FALSE)

    • mellowness [real] - "RegCB" mellowness parameter c_0. (default: 0.1)

    • greedify [bool] - Always update first policy once in "bag" (default: FALSE)

    • lambda [real] - Parameter for "softmax". (default: -1)

    • cb_min_cost [real] - Lower bound on cost. (default: 0) For ADF only

    • cb_max_cost [real] - Upper bound on cost. (default: 1) For ADF only

    • first_only [bool] - Only explore the first action in a tie-breaking event. For ADF only (default: FALSE)

  • cbify:

    • num_classes [int] - number of classes

    • cbify_cs [bool] - consume cost-sensitive classification examples instead of multiclass (default: FALSE)

    • loss0 [real] - loss for correct label (default: 0)

    • loss1 [real] - loss for incorrect label (default: 1)

  • multiworld_test:

    • features [string] - Evaluate features as a policies

    • learn [int] - Do Contextual Bandit learning on <n> classes.

    • num_classes [bool] - Discard mwt policy features before learning (default: FALSE)

  • nn:

    • num_hidden [int] - number of hidden units

    • inpass [bool] - Train or test sigmoidal feedforward network with input passthrough (default: FALSE)

    • multitask [bool] - Share hidden layer across all reduced tasks (default: FALSE)

    • dropout [bool] - Train or test sigmoidal feedforward network using dropout (default: FALSE)

    • meanfield [bool] - Train or test sigmoidal feedforward network using mean field (default: FALSE)

  • topk:

    • num_k [int] - number of top k recomendations

  • struct_search:

    • id [int] - maximum action id or 0 for LDF

    • search_task [string] - search task: sequence, sequencespan, sequence_ctg, argmax, sequence_demoldf, multiclasstask, dep_parser, entity_relation, hook, graph

    • search_interpolation [string] - at what level should interpolation happen? (data or policy)

    • search_rollout [string] - how should rollouts be executed? (policy, oracle, mix_per_state, mix_per_roll, none)

    • search_rollin [string] - how should past trajectories be generated? (policy, oracle, mix_per_state, mix_per_roll)

    • search_passes_per_policy [int] - number of passes per policy (only valid for search_interpolation=policy). (default: 1)

    • search_beta [real] - interpolation rate for policies (only valid for search_interpolation=policy). (default: 0.5)

    • search_alpha [real] - annealed beta = 1-(1-alpha)^t (only valid for search_interpolation=data). (default: 1e-10)

    • search_total_nb_policies [int] - if we are going to train the policies through multiple separate calls to vw, we need to specify this parameter and tell vw how many policies are eventually going to be trained

    • search_trained_nb_policies [int] - the number of trained policies in a file

    • search_allowed_transitions [string] - read file of allowed transitions. default: all transitions are allowed

    • search_subsample_time [real] - instead of training at all timesteps, use a subset. if value in (0,1), train on a random v

    • search_neighbor_features [string] - copy features from neighboring lines. argument looks like: '-1:a,+2' meaning copy previous line from namespace "a" and next line from namespace "unnamed", where ',' separates them

    • search_rollout_num_steps [int] - how many calls of "loss" before we stop really predicting on rollouts and switch to oracle (default means "infinite")

    • search_history_length [int] - some tasks allow you to specify how much history their depend on; specify that here. (default: 1)

    • search_no_caching [bool] - turn off the built-in caching ability (makes things slower, but technically more safe) (default: FALSE)

    • search_xv [bool] - train two separate policies, alternating prediction/learning. (default: FALSE)

    • search_perturb_oracle [real] - perturb the oracle on rollin with this probability. (default: 0)

    • search_linear_ordering [bool] - insist on generating examples in linear order. (default: FALSE and using hoopla permutation)

    • search_active_verify [real] - verify that active learning is doing the right thing (arg = multiplier, should be = cost_range * range_c)

    • search_save_every_k_runs [int] - save model every k runs

  • boosting:

    • num_learners [int] - number of weak learners

    • gamma [real] - weak learner's edge (=0.1), used only by online BBM (default: 0.100000001)

    • alg - specify the boosting algorithm: BBM (default), logistic (AdaBoost.OL.W), adaptive (AdaBoost.OL) (default: "BBM")

  • marginal:

    • ids [string] - Substitute marginal label estimates for ids

    • initial_denominator [real] - Initial denominator (default: 1)

    • initial_numerator [real] - Initial numerator (default: 0.5)

    • compete [bool] - Enable competition with marginal features (default: FALSE)

    • update_before_learn [string] - Update marginal values before learning (Available options: "on", "off") (default: "off")

    • unweighted_marginals [string] - Ignore importance weights when computing marginals (Available options: "on", "off") (default: "off")

    • decay [real] - Decay multiplier per event (1e-3 for example) (default=0)

Value

vwmodel list class

Examples

1
2
3
4
5
6
7
vwsetup(
 dir = tempdir(),
 model = "pk_mdl.vw",
 general_params = list(loss_function="logistic", link="logistic"),
 optimization_params = list(adaptive=FALSE),
 option = "binary"
)

ivan-pavlov/rvwgsoc documentation built on July 1, 2019, 9:40 p.m.