tune: Tuning Model Parameters

Description Arguments Value Parameters and Options Author(s) References See Also Examples

Description

This method is a member function of class "RecoSys" that uses cross validation to tune the model parameters.

The common usage of this method is

1
2
3
4
5
6
7
8
r = Reco()
r$tune(train_data, opts = list(dim      = c(10L, 20L),
                               costp_l1 = c(0, 0.1),
                               costp_l2 = c(0.01, 0.1),
                               costq_l1 = c(0, 0.1),
                               costq_l2 = c(0.01, 0.1),
                               lrate    = c(0.01, 0.1))
)

Arguments

r

Object returned by Reco().

train_data

An object of class "DataSource" that describes the source of training data, typically returned by function data_file() or data_memory().

opts

A number of candidate tuning parameter values and extra options in the model tuning procedure. See section Parameters and Options for details.

Value

A list with two components:

min

Parameter values with minimum cross validated loss. This is a list that can be passed to the opts argument in $train().

res

A data frame giving the supplied candidate values of tuning parameters, and one column showing the loss function value associated with each combination.

Parameters and Options

The opts argument should be a list that provides the candidate values of tuning parameters and some other options. For tuning parameters (dim, costp_l1, costp_l2, costq_l1, costq_l2, and lrate), users can provide a numeric vector for each one, so that the model will be evaluated on each combination of the candidate values. For other non-tuning options, users should give a single value. If a parameter or option is not set by the user, the program will use a default one.

See below for the list of available parameters and options:

dim

Tuning parameter, the number of latent factors. Can be specified as an integer vector, with default value c(10L, 20L).

costp_l1

Tuning parameter, the L1 regularization cost for user factors. Can be specified as a numeric vector, with default value c(0, 0.1).

costp_l2

Tuning parameter, the L2 regularization cost for user factors. Can be specified as a numeric vector, with default value c(0.01, 0.1).

costq_l1

Tuning parameter, the L1 regularization cost for item factors. Can be specified as a numeric vector, with default value c(0, 0.1).

costq_l2

Tuning parameter, the L2 regularization cost for item factors. Can be specified as a numeric vector, with default value c(0.01, 0.1).

lrate

Tuning parameter, the learning rate, which can be thought of as the step size in gradient descent. Can be specified as a numeric vector, with default value c(0.01, 0.1).

loss

Character string, the loss function. Default is "l2", see section Parameters and Options in $train() for details.

nfold

Integer, the number of folds in cross validation. Default is 5.

niter

Integer, the number of iterations. Default is 20.

nthread

Integer, the number of threads for parallel computing. Default is 1.

nbin

Integer, the number of bins. Must be greater than nthread. Default is 20.

nmf

Logical, whether to perform non-negative matrix factorization. Default is FALSE.

verbose

Logical, whether to show detailed information. Default is FALSE.

progress

Logical, whether to show a progress bar. Default is TRUE.

Author(s)

Yixuan Qiu <http://statr.me>

References

W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Fast Parallel Stochastic Gradient Method for Matrix Factorization in Shared Memory Systems. ACM TIST, 2015.

W.-S. Chin, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. A Learning-rate Schedule for Stochastic Gradient Methods to Matrix Factorization. PAKDD, 2015.

W.-S. Chin, B.-W. Yuan, M.-Y. Yang, Y. Zhuang, Y.-C. Juan, and C.-J. Lin. LIBMF: A Library for Parallel Matrix Factorization in Shared-memory Systems. Technical report, 2015.

See Also

$train()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
## Not run: 
train_set = system.file("dat", "smalltrain.txt", package = "recosystem")
train_src = data_file(train_set)
r = Reco()
set.seed(123) # This is a randomized algorithm
res = r$tune(
    train_src,
    opts = list(dim = c(10, 20, 30),
                costp_l1 = 0, costq_l1 = 0,
                lrate = c(0.05, 0.1, 0.2), nthread = 2)
)
r$train(train_src, opts = res$min)

## End(Not run)

recosystem documentation built on Sept. 2, 2017, 9:03 a.m.