train: Training a Recommender Model

Description Arguments Parameters and Options Author(s) References See Also Examples

Description

This method is a member function of class "RecoSys" that trains a recommender model. It will read from a training data source and create a model file at the specified location. The model file contains necessary information for prediction.

The common usage of this method is

1
2
3
r = Reco()
r$train(train_data, out_model = file.path(tempdir(), "model.txt"),
        opts = list())

Arguments

r

Object returned by Reco().

train_data

An object of class "DataSource" that describes the source of training data, typically returned by function data_file() or data_memory().

out_model

Path to the model file that will be created.

opts

A number of parameters and options for the model training. See section Parameters and Options for details.

Parameters and Options

The opts argument is a list that can supply any of the following parameters:

loss

Character string, the loss function. Default is "l2", see below for details.

dim

Integer, the number of latent factors. Default is 10.

costp_l1

Numeric, L1 regularization parameter for user factors. Default is 0.

costp_l2

Numeric, L2 regularization parameter for user factors. Default is 0.1.

costq_l1

Numeric, L1 regularization parameter for item factors. Default is 0.

costq_l2

Numeric, L2 regularization parameter for item factors. Default is 0.1.

lrate

Numeric, the learning rate, which can be thought of as the step size in gradient descent. Default is 0.1.

niter

Integer, the number of iterations. Default is 20.

nthread

Integer, the number of threads for parallel computing. Default is 1.

nbin

Integer, the number of bins. Must be greater than nthread. Default is 20.

nmf

Logical, whether to perform non-negative matrix factorization. Default is FALSE.

verbose

Logical, whether to show detailed information. Default is TRUE.

The loss option may take the following values:

For real-valued matrix factorization,

"l2"

Squared error (L2-norm)

"l1"

Absolute error (L1-norm)

"kl"

Generalized KL-divergence

For binary matrix factorization,

"log"

Logarithmic error

"squared_hinge"

Squared hinge loss

"hinge"

Hinge loss

For one-class matrix factorization,

"row_log"

Row-oriented pair-wise logarithmic loss

"col_log"

Column-oriented pair-wise logarithmic loss

Author(s)

Yixuan Qiu <http://statr.me>

References

W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems. ACM TIST, 2015.

W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Learning-rate Schedule for Stochastic Gradient Methods to Matrix Factorization. PAKDD, 2015.

W.-S. Chin, B.-W. Yuan, M.-Y. Yang, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems. Technical report, 2015.

See Also

$tune(), $output(), $predict()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## Training model from a data file
train_set = system.file("dat", "smalltrain.txt", package = "recosystem")
r = Reco()
set.seed(123) # This is a randomized algorithm
r$train(data_file(train_set),
        opts = list(dim = 20, costp_l2 = 0.01, costq_l2 = 0.01, nthread = 1)
)

## Training model from data in memory
train_df = read.table(train_set, sep = " ", header = FALSE)
set.seed(123)
r$train(data_memory(train_df[, 1], train_df[, 2], train_df[, 3]),
        opts = list(dim = 20, costp_l2 = 0.01, costq_l2 = 0.01, nthread = 1)
)

recosystem documentation built on Sept. 2, 2017, 9:03 a.m.