data_source: Specifying Data Source

Description Usage Arguments Details Value Author(s) See Also

Description

Functions in this page are used to specify the source of data in the recommender system. They are intended to provide the input argument of functions such as $tune(), $train(), and $predict(). Currently two data formats are supported: data file (via function data_file()), and data in memory as R objects (via function data_memory()).

Usage

1
2
3
data_file(path, index1 = FALSE, ...)

data_memory(user_index, item_index, rating = NULL, index1 = FALSE, ...)

Arguments

path

Path to the data file.

index1

Whether the user indices and item indices start with 1 (index1 = TRUE) or 0 (index1 = FALSE).

...

Currently unused.

user_index

An integer vector giving the user indices of rating scores.

item_index

An integer vector giving the item indices of rating scores.

rating

A numeric vector of the observed entries in the rating matrix. Can be specified as NULL for testing data, in which case it is ignored.

Details

In $tune() and $train(), functions in this page are used to specify the source of training data. data_file() expects a text file that describes a sparse matrix in triplet form, i.e., each line in the file contains three numbers

1
row col value

representing a number in the rating matrix with its location. In real applications, it typically looks like

1
user_index item_index rating

The ‘smalltrain.txt’ file in the ‘dat’ directory of this package shows an example of training data file.

From version 0.4 recosystem supports two special types of matrix factorization: the binary matrix factorization (BMF), and the one-class matrix factorization (OCMF). BMF requires ratings to take value from {-1, 1}, and OCMF requires all the ratings to be positive.

If user index, item index, and ratings are stored as R vectors in memory, they can be passed to data_memory() to form the training data source.

By default the user index and item index start with zeros, and the option index1 = TRUE can be set if they start with ones.

In $predict(), functions in this page provide the source of testing data. The testing data have the same format as training data, except that the value (rating) column is not required, and will be ignored if it is provided. The ‘smalltest.txt’ file in the ‘dat’ directory of this package shows an example of testing data file.

Value

An object of class "DataSource" as required by $tune(), $train(), and $predict().

Author(s)

Yixuan Qiu <http://statr.me>

See Also

$tune(), $train(), $predict()


recosystem documentation built on Sept. 2, 2017, 9:03 a.m.