train: Training a Recommender Model
In yixuan/recosystem: Recommender System using Matrix Factorization

train

R Documentation

Training a Recommender Model

Description

This method is a member function of class "RecoSys" that trains a recommender model. It will read from a training data source and create a model file at the specified location. The model file contains necessary information for prediction.

The common usage of this method is

r = Reco()
r$train(train_data, out_model = file.path(tempdir(), "model.txt"),
        opts = list())

Arguments

`r`	Object returned by `Reco`().
`train_data`	An object of class "DataSource" that describes the source of training data, typically returned by function `data_file()`, `data_memory()`, or `data_matrix()`.
`out_model`	Path to the model file that will be created. If passing `NULL`, the model will be stored in-memory, and model matrices can then be accessed under `r$model$matrices`.
`opts`	A number of parameters and options for the model training. See section Parameters and Options for details.

Parameters and Options

The opts argument is a list that can supply any of the following parameters:

loss: Character string, the loss function. Default is "l2", see below for details.
dim: Integer, the number of latent factors. Default is 10.
costp_l1: Numeric, L1 regularization parameter for user factors. Default is 0.
costp_l2: Numeric, L2 regularization parameter for user factors. Default is 0.1.
costq_l1: Numeric, L1 regularization parameter for item factors. Default is 0.
costq_l2: Numeric, L2 regularization parameter for item factors. Default is 0.1.
lrate: Numeric, the learning rate, which can be thought of as the step size in gradient descent. Default is 0.1.
niter: Integer, the number of iterations. Default is 20.
nthread: Integer, the number of threads for parallel computing. Default is 1.
nbin: Integer, the number of bins. Must be greater than nthread. Default is 20.
nmf: Logical, whether to perform non-negative matrix factorization. Default is FALSE.
verbose: Logical, whether to show detailed information. Default is TRUE.

The loss option may take the following values:

For real-valued matrix factorization,

"l2": Squared error (L2-norm)
"l1": Absolute error (L1-norm)
"kl": Generalized KL-divergence

For binary matrix factorization,

"log": Logarithmic error
"squared_hinge": Squared hinge loss
"hinge": Hinge loss

For one-class matrix factorization,

"row_log": Row-oriented pair-wise logarithmic loss
"col_log": Column-oriented pair-wise logarithmic loss

Author(s)

Yixuan Qiu <https://statr.me>

References

W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems. ACM TIST, 2015.

W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Learning-rate Schedule for Stochastic Gradient Methods to Matrix Factorization. PAKDD, 2015.

W.-S. Chin, B.-W. Yuan, M.-Y. Yang, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems. Technical report, 2015.

Examples

## Training model from a data file
train_set = system.file("dat", "smalltrain.txt", package = "recosystem")
train_data = data_file(train_set)
r = Reco()
set.seed(123) # This is a randomized algorithm
# The model will be saved to a file
r$train(train_data, out_model = file.path(tempdir(), "model.txt"),
        opts = list(dim = 20, costp_l2 = 0.01, costq_l2 = 0.01, nthread = 1)
)

## Training model from data in memory
train_df = read.table(train_set, sep = " ", header = FALSE)
train_data = data_memory(train_df[, 1], train_df[, 2], rating = train_df[, 3])
set.seed(123)
# The model will be stored in memory
r$train(train_data, out_model = NULL,
        opts = list(dim = 20, costp_l2 = 0.01, costq_l2 = 0.01, nthread = 1)
)

## Training model from data in a sparse matrix
if(require(Matrix))
{
    mat = Matrix::sparseMatrix(i = train_df[, 1], j = train_df[, 2], x = train_df[, 3],
                               repr = "T", index1 = FALSE)
    train_data = data_matrix(mat)
    r$train(train_data, out_model = NULL,
            opts = list(dim = 20, costp_l2 = 0.01, costq_l2 = 0.01, nthread = 1))
}

yixuan/recosystem documentation built on May 6, 2023, 9:26 a.m.