df2vw: Create a VW data file from a R data.frame object

Description Usage Arguments

Description

Create a VW data file from a R data.frame object

Usage

1
2
3
4
df2vw(data, file_path, namespaces = NULL, keep_space = NULL,
  fixed = NULL, targets = NULL, probabilities = NULL,
  weight = NULL, base = NULL, tag = NULL, multiline = NULL,
  append = FALSE)

Arguments

data

[data.frame] data.frame object to be converted

file_path

[string] file name of the resulting data in VW-friendly format

namespaces

[list or yaml file] name of each namespace and each variable for each namespace can be a R list, or a YAML file example namespace with the IRIS database: namespaces = list(sepal = list('Sepal.Length', 'Sepal.Width'), petal = list('Petal.Length', 'Petal.Width') this creates 2 namespaces (sepal and petal) containing the features defined by elements of this lists.

keep_space

[string vector] keep spaces for this features Example:"FERRARI 4Si" With keep_space will be "FERRARI 4Si" and will be treated as two features Without keep_space will be "FERRARI_4Si" and will be treated as one feature

fixed

[string vector] fixed parsing for this features Similar to keep_space, but parse features exactly without replacement of special characters ("(", ")", "|", ":", "'"). Can be used for LDA ("word_1:2 word_2:3" will stay the same), but should be used carefully, because special characters can ruin final VW format file.

targets

[string or string vector] If [string] then will be treated as vector with real number labels for regular VW input format. If [string vector] then will be treated as vectors with class costs for wap and csoaa multi-class classification algorithms or as vectors with actions for Contextual Bandit algorithm.

probabilities

[string vector] vectors with action probabilities for Contextual Bandit algorithm.

weight

[string] weight (importance) of each line of the dataset.

base

[string] base of each line of the dataset. Used for residual regression.

tag

[string] tag of each line of the dataset.

multiline

[integer] number of labels (separate lines) for multilines example

append

[bool] data to be appended to the result file


ivan-pavlov/rvwgsoc documentation built on July 1, 2019, 9:40 p.m.