fit_models  R Documentation 
Models for collective matrix factorization (also known as multiview or
multiway). These models try to approximate a matrix 'X' as the product of two
lowerrank matrices 'A' and 'B' (that is:
\mathbf{X} \approx \mathbf{A} \mathbf{B}^T
)
by finding the values of 'A' and 'B' that minimize the squared error w.r.t. 'X',
optionally aided with side information matrices 'U' and 'I' about
rows and columns of 'X'.
The package documentation is built with recommendation systems in mind, for which it assumes that 'X' is a sparse matrix in which users represent rows, items represent columns, and the nonmissing values denote interactions such as movie ratings from users to items. The idea behind it is to recommend the missing entries in 'X' that have the highest predicted value according to the approximation. For other domains, take any mention of users as rows and any mention of items as columns (e.g. when used for topic modeling, the "users" are documents and the "items" are word occurrences).
In the 'CMF' model (main functionality of the package and most flexible model type),
the 'A' and 'B' matrices are also used to jointly factorize the side information
matrices  that is:
\mathbf{U} \approx \mathbf{A}\mathbf{C}^T, \:\:\:
\mathbf{I} \approx \mathbf{B}\mathbf{D}^T
,
sharing the same components or latent factors for two factorizations.
Informally, this means that the obtained factors now need to explain both the
interactions data and the attributes data, making them generalize better to
the nonpresent entries of 'X' and to new data.
In 'CMF' and the other nonimplicit models, the 'X' data is always centered beforehand by subtracting its mean, and might optionally add user and item biases (which are model parameters, not preestimated).
The model might optionally generate socalled implicit features from the same 'X'
data, by factorizing binary matrices which tell which entries in 'X' are present,
i.e.: \mathbf{I}_x \approx \mathbf{A}\mathbf{B}_i^T , \:\:\:
\mathbf{I}_x^T \approx \mathbf{B}\mathbf{A}_i^T
,
where \mathbf{I}_x
is an indicator matrix which is treated as
full (no unknown values).
The 'CMF_implicit' model extends the collective factorization idea to the implicitfeedback case, based on reference [3]. While in 'CMF' the values of 'X' are taken at face value and the objective is to minimize squared error over the nonmissing entries, in the implicitfeedback variants the matrix 'X' is assumed to be binary (all entries are zero or one, with no unknown values), with the positive entries (those which are not missing in the data) having a weight determined by 'X'.
'CMF' is intended for explicit feedback data (e.g. movie ratings, which contain both likes and dislikes), whereas 'CMF_implicit' is intended for implicit feedback data (e.g. number of times each user watched each movie/series, which do not contain dislikes and the values are treated as confidence scores).
The 'MostPopular' model is a simpler heuristic implemented for comparison purposes which is equivalent to either 'CMF' or 'CMF_implicit' with 'k=0' plus user/item biases. If a personalized model is not able to beat this heuristic under the evaluation metrics of interest, chances are that such personalized model needs better tuning.
The 'ContentBased' model offers a different alternative in which the latent factors
are determined directly from the user/item attributes (which are no longer optional) 
that is: \mathbf{A} = \mathbf{U} \mathbf{C}, \:\:\:
\mathbf{B} = \mathbf{I} \mathbf{D}
, optionally adding percolumn
intercepts, and is aimed at coldstart predictions (such a model is extremely
unlikely to perform better for new users in the presence of interactions data).
For this model, the package provides functionality for making predictions about
potential new entries in 'X' which involve both new rows and new columns at the
same time.
Unlike the others, it does not offer an implicitfeedback variant.
The 'OMF_explicit' model extends the 'ContentBased' by adding a free offset determined
for each user and item according to 'X' data alone  that is:
\mathbf{A}_m = \mathbf{A} + \mathbf{U} \mathbf{C}, \:\:\:
\mathbf{B}_m = \mathbf{B} + \mathbf{I}\mathbf{D}, \:\:\:
\mathbf{X} \approx \mathbf{A}_m \mathbf{B}_m^T
,
and 'OMF_implicit' extends the idea to the implicitfeedback case.
Note that 'ContentBased' is equivalent to 'OMF_explicit' with 'k=0', 'k_main=0' and 'k_sec>0' (see documentation for details about these parameters). For a different formulation in which user factors are determined directly for item attributes (and same for items with user attributes), it's also possible to use 'OMF_explicit' with 'k=0' while passing 'k_sec' and 'k_main'.
('OMF_explicit' and 'OMF_implicit' were only implemented for research purposes for coldstart recommendations in cases in which there is side info about users but not about items or viceversa  it is not recommended to rely on them.)
Some extra considerations about the parameters here:
By default, the terms in the optimization objective are not scaled by the number of entries (see parameter 'scale_lam'), thus hyperparameters such as 'lambda' will require more tuning than in other software and will require trying a wider range of values.
The regularization applied to the matrices is the same for all users and for all items.
The default hyperparameters are not geared towards speed  for faster fitting times, use ‘method=’als'', 'use_cg=TRUE', 'finalize_chol=FALSE', 'precompute_for_predictions=FALSE', 'verbose=FALSE', and pass 'X' as a matrix (either sparse or dense).
The default hyperparameters are also very different than in other software  for example, for ‘CMF_implicit', in order to match the Python package’s 'implicit' hyperparameters, one would have to use 'k=100', 'lambda=0.01', 'niter=15', 'use_cg=TRUE', 'finalize_chol=FALSE', and use singleprecision floating point numbers (not supported in the R version of this package).
CMF(
X,
U = NULL,
I = NULL,
U_bin = NULL,
I_bin = NULL,
weight = NULL,
k = 40L,
lambda = 10,
method = "als",
use_cg = TRUE,
user_bias = TRUE,
item_bias = TRUE,
center = TRUE,
add_implicit_features = FALSE,
scale_lam = FALSE,
scale_lam_sideinfo = FALSE,
scale_bias_const = FALSE,
k_user = 0L,
k_item = 0L,
k_main = 0L,
w_main = 1,
w_user = 1,
w_item = 1,
w_implicit = 0.5,
l1_lambda = 0,
center_U = TRUE,
center_I = TRUE,
maxiter = 800L,
niter = 10L,
parallelize = "separate",
corr_pairs = 4L,
max_cg_steps = 3L,
precondition_cg = FALSE,
finalize_chol = TRUE,
NA_as_zero = FALSE,
NA_as_zero_user = FALSE,
NA_as_zero_item = FALSE,
nonneg = FALSE,
nonneg_C = FALSE,
nonneg_D = FALSE,
max_cd_steps = 100L,
precompute_for_predictions = TRUE,
include_all_X = TRUE,
verbose = TRUE,
print_every = 10L,
handle_interrupt = TRUE,
seed = 1L,
nthreads = parallel::detectCores()
)
CMF_implicit(
X,
U = NULL,
I = NULL,
k = 40L,
lambda = 1,
alpha = 1,
use_cg = TRUE,
k_user = 0L,
k_item = 0L,
k_main = 0L,
w_main = 1,
w_user = 1,
w_item = 1,
l1_lambda = 0,
center_U = TRUE,
center_I = TRUE,
niter = 10L,
max_cg_steps = 3L,
precondition_cg = FALSE,
finalize_chol = FALSE,
NA_as_zero_user = FALSE,
NA_as_zero_item = FALSE,
nonneg = FALSE,
nonneg_C = FALSE,
nonneg_D = FALSE,
max_cd_steps = 100L,
apply_log_transf = FALSE,
precompute_for_predictions = TRUE,
verbose = TRUE,
handle_interrupt = TRUE,
seed = 1L,
nthreads = parallel::detectCores()
)
MostPopular(
X,
weight = NULL,
implicit = FALSE,
center = TRUE,
user_bias = ifelse(implicit, FALSE, TRUE),
lambda = 10,
alpha = 1,
NA_as_zero = FALSE,
apply_log_transf = FALSE,
nonneg = FALSE,
scale_lam = FALSE,
scale_bias_const = FALSE
)
ContentBased(
X,
U,
I,
weight = NULL,
k = 20L,
lambda = 100,
user_bias = FALSE,
item_bias = FALSE,
add_intercepts = TRUE,
maxiter = 3000L,
corr_pairs = 3L,
parallelize = "separate",
verbose = TRUE,
print_every = 100L,
handle_interrupt = TRUE,
start_with_ALS = TRUE,
seed = 1L,
nthreads = parallel::detectCores()
)
OMF_explicit(
X,
U = NULL,
I = NULL,
weight = NULL,
k = 50L,
lambda = 10,
method = "lbfgs",
use_cg = TRUE,
precondition_cg = FALSE,
user_bias = TRUE,
item_bias = TRUE,
center = TRUE,
k_sec = 0L,
k_main = 0L,
add_intercepts = TRUE,
w_user = 1,
w_item = 1,
maxiter = 10000L,
niter = 10L,
parallelize = "separate",
corr_pairs = 7L,
max_cg_steps = 3L,
finalize_chol = TRUE,
NA_as_zero = FALSE,
verbose = TRUE,
print_every = 100L,
handle_interrupt = TRUE,
seed = 1L,
nthreads = parallel::detectCores()
)
OMF_implicit(
X,
U = NULL,
I = NULL,
k = 50L,
lambda = 1,
alpha = 1,
use_cg = TRUE,
precondition_cg = FALSE,
add_intercepts = TRUE,
niter = 10L,
apply_log_transf = FALSE,
max_cg_steps = 3L,
finalize_chol = FALSE,
verbose = FALSE,
handle_interrupt = TRUE,
seed = 1L,
nthreads = parallel::detectCores()
)
X 
The main matrix with interactions data to factorize (e.g. movie ratings by users, bagofwords representations of texts, etc.). The package is built with recommender systems in mind, and will assume that 'X' is a matrix in which users are rows, items are columns, and values denote interactions between a given user and item. Can be passed in the following formats:
If using the package 'softImpute', objects of type 'incomplete' from that package can be converted to 'Matrix' objects through e.g. 'as(X, "TsparseMatrix")'. Sparse matrices can be created through e.g. 'Matrix::sparseMatrix(..., repr="T")'. It is recommended for faster fitting times to pass the 'X' data as a matrix (either sparse or dense) as then it will avoid internal reindexes. Note that, generally, it's possible to pass partially disjoints sets of users/items between the different matrices (e.g. it's possible for both the 'X' and 'U' matrices to have rows that the other doesn't have). If any of the inputs has less rows/columns than the other(s) (e.g. 'U' has more rows than 'X', or 'I' has more rows than there are columns in 'X'), will assume that the rest of the rows/columns have only missing values. However, when having partially disjoint inputs, the order of the rows/columns matters for speed for the 'CMF' and 'CMF_implicit' models under the ALS method, as it might run faster when the 'U'/'I' inputs that do not have matching rows/columns in 'X' have those unmatched rows/columns at the end (last rows/columns) and the 'X' input is shorter. See also the parameter 'include_all_X' for info about predicting with mismatched 'X'. If passed as sparse/triplets, the nonmissing values should not contain any 'NA'/'NaN's. 
U 
User attributes information. Can be passed in the following formats:
If 'X' is a 'data.frame', should be either a 'data.frame' or 'matrix', containing row names matching to the first column of 'X' (which denotes the user/row IDs of the nonzero entries). If 'U' is sparse, 'X' should be passed as sparse or dense matrix (not a 'data.frame'). Note that, if 'U' is a 'matrix' or 'data.frame', it should have the same number of rows as 'X' in the 'ContentBased', 'OMF_explicit', and 'OMF_implicit' models. Be aware that 'CMF' and 'CMF_implicit' tend to perform better with dense and nottoowide user/item attributes. 
I 
Item attributes information. Can be passed in the following formats:
If 'X' is a 'data.frame', should be either a 'data.frame' or 'matrix', containing row names matching to the second column of 'X' (which denotes the item/column IDs of the nonzero entries). If 'I' is sparse, 'X' should be passed as sparse or dense matrix (not a 'data.frame'). Note that, if 'I' is a 'matrix' or 'data.frame', it should have the same number of rows as there are columns in 'X' in the 'ContentBased', 'OMF_explicit', and 'OMF_implicit' models. Be aware that 'CMF' and 'CMF_implicit' tend to perform better with dense and nottoowide user/item attributes. 
U_bin 
User binary columns/attributes (all values should be zero, one, or missing), for which a sigmoid transformation will be applied on the predicted values. If 'X' is a 'data.frame', should also be a 'data.frame', with row names matching to the first column of 'X' (which denotes the user/row IDs of the nonzero entries). Cannot be passed as a sparse matrix. Note that 'U' and 'U_bin' are not mutually exclusive. Only supported with “method='lbfgs'“. 
I_bin 
Item binary columns/attributes (all values should be zero, one, or missing), for which a sigmoid transformation will be applied on the predicted values. If 'X' is a 'data.frame', should also be a 'data.frame', with row names matching to the second column of 'X' (which denotes the item/column IDs of the nonzero entries). Cannot be passed as a sparse matrix. Note that 'I' and 'I_bin' are not mutually exclusive. Only supported with “method='lbfgs'“. 
weight 
(Optional and not recommended) Observation weights for entries in 'X'. Must have the same shape as 'X'  that is, if 'X' is a sparse matrix, must be a vector with the same number of nonzero entries as 'X', if 'X' is a dense matrix, 'weight' must also be a dense matrix. Alternatively, if 'X' is a sparse COO matrix, 'weight' may also be passed as a sparse COO matrix in the same format, but it will not check whether the indices match between the two. If 'X' is a 'data.frame', should be passed instead as its fourth column. Cannot have missing values. This is only supported for the explicitfeedback models, as the implicitfeedback ones determine the weights through 'X'. 
k 
Number of latent factors to use (dimensionality of the lowrank factorization)  these will be shared between the factorization of the 'X' matrix and the side info matrices in the 'CMF' and 'CMF_implicit' models, and will be determined jointly by interactions and side info in the 'OMF_explicit' and 'OMF_implicit' models. Additional nonshared components can also be specified through 'k_user', 'k_item', and 'k_main' (also 'k_sec' for 'OMF_explicit'). Typical values are 30 to 100. 
lambda 
Regularization parameter to apply on the squared L2 norms of the matrices. Some models ('CMF', 'CMF_implicit', 'ContentBased', and 'OMF_explicit' with the LBFGS method) can use different regularization for each matrix, in which case it should be an array with 6 entries (regardless of the model), corresponding, in this order, to: 'user_bias', 'item_bias', 'A', 'B', 'C', 'D'. Note that the default value for 'lambda' here is much higher than in other software, and that the loss/objective function is not divided by the number of entries anywhere, so this parameter needs good tuning. For example, a good value for the MovieLens10M would be 'lambda=35' (or 'lambda=0.05' with 'scale_lam=TRUE'), whereas for the LastFM360K, a good value would be 'lambda=5'. Typical values are 
method 
Optimization method used to fit the model. If passing 'lbfgs', will fit it through a gradientbased approach using an LBFGS optimizer, and if passing 'als', will fit it through the ALS (alternating leastsquares) method. LBFGS is typically a much slower and a much less memory efficient method compared to 'als', but tends to reach better local optima and allows some variations of the problem which ALS doesn't, such as applying sigmoid transformations for binary side information. Note that not all models allow choosing the optimizer:

use_cg 
In the ALS method, whether to use a conjugate gradient method to solve the closedform least squares problems. This is a faster and more memoryefficient alternative than the default Cholesky solver, but less exact, less numerically stable, and will require slightly more ALS iterations ('niter') to reach a good optimum. In general, better results are achieved with 'use_cg=FALSE' for the explicitfeedback models. Note that, if using this method, calculations after fitting which involve new data such as factors, might produce slightly different results from the factors obtained inside the fitted model with the same data, due to differences in numerical precision. A workaround for this issue (factors on new data that might differ slightly) is to use 'finalize_chol=TRUE'. Even if passing 'TRUE' here, will use the Cholesky method in cases in which it is faster (e.g. dense matrices with no missing values), and will not use the conjugate gradient method on new data. This option is not available when using L1 regularization and/or nonnegativity constraints. Ignored when using the LBFGS method. 
user_bias 
Whether to add user/row biases (intercepts) to the model. If using it for purposes other than recommender systems, this is is usually not suggested to include. 
item_bias 
Whether to add item/column biases (intercepts) to the model. Be aware that using item biases with low regularization for them will tend to favor items with high average ratings regardless of the number of ratings the item has received. 
center 
Whether to center the "X" data by subtracting the mean value. For recommender systems, it's highly recommended to pass 'TRUE' here, the more so if the model has user and/or item biases. For 'MostPopular', if passing 'implicit=TRUE', this option will be ignored (assumed 'FALSE'). 
add_implicit_features 
Whether to automatically add socalled implicit features from the data, as in reference [5] and similar. If using this for recommender systems with small amounts of data, it's recommended to pass 'TRUE' here. 
scale_lam 
Whether to scale (increase) the regularization parameter for each row of the model matrices (A, B, C, D) according to the number of nonmissing entries in the data for that particular row, as proposed in reference [7]. For the A and B matrices, the regularization will only be scaled according to the number of nonmissing entries in 'X' (see also the 'scale_lam_sideinfo' parameter). Note that, when using the options 'NA_as_zero_*', all entries are considered to be nonmissing. If passing 'TRUE' here, the optimal value for 'lambda' will be much smaller (and likely below 0.1). This option tends to give better results, but requires more hyperparameter tuning. Only supported for the ALS method. For the 'MostPopular' model, this is not supported when passing 'implicit=TRUE', and it is not recommended to use for it, as it will tend to recommend items which have a single user interaction with the maximum possible value (e.g. 5star movies from only 1 user). When generating factors based on side information alone, if passing 'scale_lam_sideinfo', will regularize assuming there was one observation present. Be aware that using this option without 'scale_lam_sideinfo=TRUE' can lead to bad coldstart recommendations as it will set a very small regularization for users who have no 'X' data. Warning: in smaller datasets, using this option can result in topN recommendations having mostly items with very few interactions (see parameter 'scale_bias_const'). 
scale_lam_sideinfo 
Whether to scale (increase) the regularization parameter for each row of the "A" and "B" matrices according to the number of nonmissing entries in both 'X' and the side info matrices 'U' and 'I'. If passing 'TRUE' here, 'scale_lam' will also be assumed to be 'TRUE'. 
scale_bias_const 
When passing 'scale_lam=TRUE' and 'user_bias=TRUE' or 'item_bias=TRUE', whether to apply the same scaling to the regularization of the biases to all users and items, according to the average number of nonmissing entries rather than to the number of entries for each specific user/item. While this tends to result in worse RMSE, it tends to make the topN recommendations less likely to select items with only a few interactions from only a few users. Ignored when passing 'scale_lam=FALSE' or not using user/item biases. 
k_user 
Number of factors in the factorizing 'A' and 'C' matrices which will be used only for the 'U' and 'U_bin' matrices, while being ignored for the 'X' matrix. These will be the first factors of the matrices once the model is fit. Will be counted in addition to those already set by 'k'. 
k_item 
Number of factors in the factorizing 'B' and 'D' matrices which will be used only for the 'I' and 'I_bin' matrices, while being ignored for the 'X' matrix. These will be the first factors of the matrices once the model is fit. Will be counted in addition to those already set by 'k'. 
k_main 
For the 'CMF' and 'CMF_implicit' models, this denotes the number of factors in the factorizing 'A' and 'B' matrices which will be used only for the 'X' matrix, while being ignored for the 'U', 'U_bin', 'I', and 'I_bin' matrices. For the 'OMF_explicit' model, this denotes the number of factors which are determined without the user/item side information. These will be the last factors of the matrices once the model is fit. Will be counted in addition to those already set by 'k'. 
w_main 
Weight in the optimization objective for the errors in the factorization of the 'X' matrix. 
w_user 
For the 'CMF' and 'CMF_implicit' models, this denotes the weight in the optimization objective for the errors in the factorization of the 'U' and 'U_bin' matrices. For the 'OMF_explicit' model, this denotes the multiplier for the effect of the user attributes in the final factor matrices. Ignored when passing neither 'U' nor 'U_bin'. 
w_item 
For the 'CMF' and 'CMF_implicit' models, this denotes the weight in the optimization objective for the errors in the factorization of the 'I' and 'I_bin' matrices. For the 'OMF_explicit' model, this denotes the multiplier for the effect of the item attributes in the final factor matrices. Ignored when passing neither 'I' nor 'I_bin'. 
w_implicit 
Weight in the optimization objective for the errors in the factorizations of the implicit 'X' matrices. Note that, depending on the sparsity of the data, the sum of errors from these factorizations might be much larger than for the original 'X' and a smaller value will perform better. It is recommended to tune this parameter carefully. Ignored when passing 'add_implicit_features=FALSE'. 
l1_lambda 
Regularization parameter to apply to the L1 norm of the model matrices. Can also pass different values for each matrix (see 'lambda' for details). Note that, when adding L1 regularization, the model will be fit through a coordinate descent procedure, which is significantly slower than the Cholesky method with L2 regularization. Only supported with the ALS method. Not recommended. 
center_U 
Whether to center the 'U' matrix columnbycolumn. Be aware that this is a simple mean centering without regularization. One might want to turn this option off when using 'NA_as_zero_user=TRUE'. 
center_I 
Whether to center the 'I' matrix columnbycolumn. Be aware that this is a simple mean centering without regularization. One might want to turn this option off when using 'NA_as_zero_item=TRUE'. 
maxiter 
Maximum LBFGS iterations to perform. The procedure will halt if it has not converged after this number of updates. Note that the 'CMF' model is likely to require fewer iterations to converge compared to other models, whereas the 'ContentBased' model, which optimizes a highly nonlinear function, will require more iterations and benefits from using more correction pairs. Using higher regularization values might also decrease the number of required iterations. Pass zero for no LBFGS iterations limit. If the procedure is spending hundreds of iterations without any significant decrease in the loss function or gradient norm, it's highly likely that the regularization is too low. Ignored when using the ALS method. 
niter 
Number of alternating leastsquares iterations to perform. Note that one iteration denotes an update round for all the matrices rather than an update of a single matrix. In general, the more iterations, the better the end result. Ignored when using the LBFGS method. Typical values are 6 to 30. 
parallelize 
How to parallelize gradient calculations when using more than one thread with ‘method=’lbfgs''. Passing 'separate' will iterate over the data twice  first by rows and then by columns, letting each thread calculate results for each row and column, whereas passing 'single' will iterate over the data only once, and then sum the obtained results from each thread. Passing 'separate' is much more memoryefficient and less prone to irreproducibility of random seeds, but might be slower for typical usecases. Ignored when passing 'nthreads=1', or when using the ALS method, or when compiling without OpenMP support. 
corr_pairs 
Number of correction pairs to use for the LBFGS optimization routine. Recommended values are between 3 and 7. Note that higher values translate into higher memory requirements. Ignored when using the ALS method. 
max_cg_steps 
Maximum number of conjugate gradient iterations to perform in an ALS round. Ignored when passing 'use_cg=FALSE' or using the LBFGS method. 
precondition_cg 
Whether to use Jacobi preconditioning for the conjugate gradient procedure. In general, this type of preconditioning is not beneficial (makes the algorithm slower) as the factor variables tend to be in the same scale, but it might help when using nonshared factors. Note that, when using preconditioning, the procedure will not check for convergence, taking instead a fixed number of steps (given by 'max_cg_steps') at each iteration regardless of whether it has reached the optimum already. Ignored when passing 'use_cg=FALSE' or 'method="als"'. 
finalize_chol 
When passing 'use_cg=TRUE' and using the ALS method, whether to perform the last iteration with the Cholesky solver. This will make it slower, but will avoid the issue of potential mismatches between the resulting factors inside the model object and calls to factors or similar with the same data. 
NA_as_zero 
Whether to take missing entries in the 'X' matrix as zeros (only when the 'X' matrix is passed as a sparse matrix or as a 'data.frame') instead of ignoring them. This is a different model from the implicitfeedback version with weighted entries, and it's a much faster model to fit. Note that passing 'TRUE' will affect the results of the functions factors and factors_single (as it will assume zeros instead of missing). It is possible to obtain equivalent results to the implicitfeedback model if passing 'TRUE' here, and then passing an 'X' with all values set to one and weights corresponding to the actual values of 'X' multiplied by 'alpha', plus 1 ('W := 1 + alpha*X' to imitate the implicitfeedback model). If passing this option, be aware that the defaults are also to perform mean centering and add user/item biases, which might be undesirable to have together with this option. For the OMF_explicit model, this option will only affect the data to which the model is fit, while being always assumed 'FALSE' for new data (e.g. when calling 'factors'). 
NA_as_zero_user 
Whether to take missing entries in the 'U' matrix as zeros (only when the 'U' matrix is passed as a sparse matrix) instead of ignoring them. Note that passing 'TRUE' will affect the results of the functions factors and factors_single if no data is passed there (as it will assume zeros instead of missing). This option is always assumed 'TRUE' for the 'ContentBased', 'OMF_explicit', and 'OMF_implicit' models. 
NA_as_zero_item 
Whether to take missing entries in the 'I' matrix as zeros (only when the 'I' matrix is passed as a sparse matrix) instead of ignoring them. This option is always assumed 'TRUE' for the 'ContentBased', 'OMF_explicit', and 'OMF_implicit' models. 
nonneg 
Whether to constrain the 'A' and 'B' matrices to be nonnegative. In order for this to work correctly, the 'X' input data must also be nonnegative. This constraint will also be applied to the 'Ai' and 'Bi' matrices if passing 'add_implicit_features=TRUE'. Important: be aware that the default options are to perform mean centering and to add user and item biases, which might be undesirable and hinder performance when having nonnegativity constraints (especially mean centering). This option is not available when using the LBFGS method. Note that, when determining nonnegative factors, it will always use a coordinate descent method, regardless of the value passed for 'use_cg' and 'finalize_chol'. When used for recommender systems, one usually wants to pass 'FALSE' here. For better results, do not use centering alongside this option, and use a higher regularization coupled with more iterations.. 
nonneg_C 
Whether to constrain the 'C' matrix to be nonnegative. In order for this to work correctly, the 'U' input data must also be nonnegative. Note: by default, the 'U' data will be centered by columns, which doesn't play well with nonnegativity constraints. One will likely want to pass 'center_U=FALSE' along with this. 
nonneg_D 
Whether to constrain the 'D' matrix to be nonnegative. In order for this to work correctly, the 'I' input data must also be nonnegative. Note: by default, the 'I' data will be centered by columns, which doesn't play well with nonnegativity constraints. One will likely want to pass 'center_I=FALSE' along with this. 
max_cd_steps 
Maximum number of coordinate descent updates to perform per iteration. Pass zero for no limit. The procedure will only use coordinate descent updates when having L1 regularization and/or nonnegativity constraints. This number should usually be larger than 'k'. 
precompute_for_predictions 
Whether to precompute some of the matrices that are used when making predictions from the model. If 'FALSE', it will take longer to generate predictions or topN lists, but will use less memory and will be faster to fit the model. If passing 'FALSE', can be recomputed later ondemand through function precompute.for.predictions. Note that for 'ContentBased', 'OMF_explicit', and 'OMF_implicit', this parameter will always be assumed to be 'TRUE', due to requiring the original matrices for the precomputations. 
include_all_X 
When passing an input 'X' which has less columns than rows in 'I', whether to still make calculations about the items which are in 'I' but not in 'X'. This has three effects: (a) the topN functionality may recommend such items, (b) the precomptued matrices will be less usable as they will include all such items, (c) it will be possible to pass 'X' data to the new factors or topN functions that include such columns (rows of 'I'). This option is ignored when using 'NA_as_zero', and is only relevant for the 'CMF' model as all the other models will have the equivalent of 'TRUE' here. 
verbose 
Whether to print informational messages about the optimization routine used to fit the model. Be aware that, if passing 'FALSE' and using the LBFGS method, the optimization routine will not respond to interrupt signals. 
print_every 
Print LBFGS convergence messages every niterations. Ignored when not using the LBFGS method. 
handle_interrupt 
When receiving an interrupt signal, whether the model should stop early and leave a usable object with the parameters obtained up to the point when it was interrupted (when passing 'TRUE'), or raise an interrupt exception without producing a fitted model object (when passing 'FALSE'). 
seed 
Seed to use for random number generation. If passing 'NULL', will draw a nonreproducible random integer to use as seed. 
nthreads 
Number of parallel threads to use. Note that, the more threads that are used, the higher the memory consumption. 
alpha 
Weighting parameter for the nonzero entries in the implicitfeedback model. See [3] for details. Note that, while the author's suggestion for this value is 40, other software such as the Python package 'implicit' use a value of 1, whereas Spark uses a value of 0.01 by default, and values higher than 10 are unlikely to improve results. If the data has very high values, might even be beneficial to put a very low value here  for example, for the LastFM360K, values below 1 might give better results. 
apply_log_transf 
Whether to apply a logarithm transformation on the values of 'X' (i.e. 'X := log(X)') 
implicit 
(Only selectable for the 'MostPopular' model) Whether to use the implicitfeedback model, in which the 'X' matrix is assumed to have only binary entries and each of them having a weight in the loss function given by the observer useritem interactions and other parameters. 
add_intercepts 
(Only for 'ContentBased', 'OMF_explicit', 'OMF_implicit') Whether to add intercepts/biases to the user/item attribute matrices. 
start_with_ALS 
(Only for 'ContentBased') Whether to determine the initial coefficients through an ALS procedure. This might help to speed up the procedure by starting closer to an optimum. This option is not available when the side information is passed as sparse matrices. 
k_sec 
(Only for 'OMF_explicit') Number of factors in the factorizing matrices which are determined exclusively from user/item attributes. These will be at the beginning of the 'C' and 'D' matrices once the model is fit. If there are no attributes for a given matrix (user/item), then that matrix will have an extra 'k_sec' factors (e.g. if passing user side info but not item side info, then the 'B' matrix will have an extra 'k_sec' factors). Will be counted in addition to those already set by 'k'. Not supported when using ‘method=’als''. For a different model having only 'k_sec' with 'k=0' and 'k_main=0', see the 'ContentBased' model 
In more details, the models predict the values of 'X' as follows:
'CMF':
\mathbf{X} \approx \mathbf{A} \mathbf{B}^T + \mu + \mathbf{b}_u + \mathbf{b}_i
, where \mu
is the global mean for the nonmissing
entries in 'X', and \mathbf{b}_u , \mathbf{b}_i
are the user and
item biases (column and row vector, respectively). In addition, the other matrices are
predicted as \mathbf{U} \approx \mathbf{A} \mathbf{C}^T + \mu_U
and \mathbf{I} \approx \mathbf{B} \mathbf{D}^T + \mu_I
, where
\mu_U , \mu_I
are the column means from the side info matrices,
which are determined as a simple average with no regularization (these are row
vectors), and if having binary variables, also
\mathbf{U}_{bin} \approx \sigma(\mathbf{A} \mathbf{C}_{bin}^T)
and
\mathbf{I}_{bin} \approx \sigma(\mathbf{B} \mathbf{D}_{bin}^T)
, where \sigma
is a sigmoid function (
\sigma(x) = \frac{1}{1 + e^{x}}
). Under the options 'NA_as_zero_*',
the mean(s) for that matrix are not added into the model for simplicity.
For the implicit features option, the other matrices are predicted simply as
\mathbf{I}_x \approx \mathbf{A} \mathbf{B}_i , \:\:
\mathbf{I}_x^T \approx \mathbf{B} \mathbf{A}_i
.
If using 'k_user', 'k_item', 'k_main', then for 'X', only columns '1' through
'k+k_user' are used in the approximation of 'U', and only columns 'k_user+1' through
'k_user+k+k_main' are used for the approximation of 'X' (similar thing for 'B' with
'k_item'). The implicit factors matrices (\mathbf{A}_i, \mathbf{B}_i
)
always use the same components/factors as 'X'.
Be aware that the functions for determining new factors will by default omit the bias term in the output.
'CMF_implicit': \mathbf{X} \approx \mathbf{A} \mathbf{B}^T
,
while 'U' and 'I' remain the same as for 'CMF', and the ordering of the nonshared
factors is the same. Note that there is no mean centering or user/item biases in the
implicitfeedback model, but if desired, the 'CMF' model can be made to mimic
'CMF_implicit' while still accommodating for mean centering and biases.
'MostPopular': \mathbf{X} \approx \mu + \mathbf{b}_u + \mathbf{b}_i
(when using 'implicit=FALSE') or
\mathbf{X} \approx \mathbf{b}_i
(when using 'implicit=TRUE').
'ContentBased': \mathbf{X} \approx \mathbf{A}_m \mathbf{B}_m^T
, where \mathbf{A}_m = \mathbf{U} \mathbf{C} + \mathbf{b}_C
and \mathbf{B}_m = \mathbf{I} \mathbf{D} + \mathbf{b}_D
 the \mathbf{b}_C, \mathbf{b}_D
are
percolumn/factor intercepts (these are row vectors).
'OMF_explicit':
\approx \mathbf{A}_m \mathbf{B}_m^T + \mu + \mathbf{b}_u + \mathbf{b}_i
, where
\mathbf{A}_m = w_u (\mathbf{U} \mathbf{C} + \mathbf{b}_C) + \mathbf{A}
and
\mathbf{B}_m = w_i (\mathbf{I} \mathbf{D} + \mathbf{b}_D) + \mathbf{B}
. If passing 'k_sec' and/or 'k_main', then columns
'1' through 'k_sec' of \mathbf{A}_m, \mathbf{B}_m
are determined
as those same columns from \mathbf{A}, \mathbf{B}
, while
\mathbf{U} \mathbf{C} + \mathbf{b}_C, \mathbf{I} \mathbf{D} + \mathbf{b}_D
will be shorter by 'k_sec'
columns (alternatively, can be though of as having those columns artificially set to
zeros), and columns 'k_sec+k+1' through 'k_sec+k+k_main' of
\mathbf{A}_m, \mathbf{B}_m
are determined as those last 'k_main' columns of
\mathbf{U} \mathbf{C} + \mathbf{b}_C, \mathbf{I} \mathbf{D} + \mathbf{b}_D
,
while \mathbf{A}, \mathbf{B}
will be shorter by 'k_main' columns
(alternatively, can be though of as having those columns artificially set to
zeros). If one of \mathbf{U}
or \mathbf{I}
is missing,
then the corresponding \mathbf{A}
or \mathbf{B}
matrix will
be extended by 'k_sec' columns (which will not be zeros) and the corresponding
prediction matrix (\mathbf{A}_m, \mathbf{B}_m
) will be equivalent
to that matrix (which was the free offset in the presence of side information).
'OMF_implicit': \mathbf{X} \approx \mathbf{A}_m \mathbf{B}_m^T
,
with \mathbf{A}_m, \mathbf{B}_m
remaining the same as for 'OMF_explicit'.
When calling the prediction functions, new data is always transposed or deep copied before passing them to the underlying C functions  as such, for the 'ContentBased' model, it might be faster to use the matrices directly instead (all these matrices will be under 'model$matrices', but will be transposed).
The precomputed matrices, when they are square, will only contain the lower triangle
only, as they are symmetric. For 'CMF' and 'CMF_implicit', one might also see
variations of a new matrix called 'Be' (extended 'B' matrix), which is from
reference [1] and defined as
\mathbf{B}_e = [[\mathbf{0}, \mathbf{B}_s, \mathbf{B}_m], [\mathbf{C}_a, \mathbf{C}_s, \mathbf{0}]]
, where \mathbf{B}_s
are columns 'k_item+1' through 'k_item+k' from 'B',
\mathbf{B}_m
are columns 'k_item+k+1' through 'k_item+k+k_main' from 'B',
\mathbf{C}_a
are columns '1' through 'k_user' from 'C', and
\mathbf{C}_s
are columns 'k_user+1' through 'k_user+k' from 'C'.
This matrix is used for the closedform solution of a given vector of 'A' in
the functions for predicting on new data (see reference [1] for details or if
you would like to use your own solver with the fitted matrices from this package),
as long as there are no binary columns to which to apply a transformation, in
which case it will always solve them with the LBFGS method.
When using user biases, the precomputed matrices will have an extra column, which is derived by adding an extra column to 'B' (at the end) consisting of all ones (this is how the user biases are calculated).
For the implicitfeedback models, the weights of the positive entries (defined
as the nonmissing entries in 'X') will be given by
W = 1 + \alpha \mathbf{X}
.
For the 'OMF' models, the 'ALS' method will first find a solution for the
equivalent 'CMF' problem with no side information, and will then try to predict
the resulting matrices given the user/item attributes, assigning the residuals
as the free offsets. While this might sound reasonable, in practice it tends to
give rather different results than when fit through the LBFGS method. Strictly
speaking, the regularization parameter in this case is applied to the
\mathbf{A}_m, \mathbf{B}_m
matrices, and the prediction functions
for new data will offer an option 'exact' for determining whether to apply the
regularization to the \mathbf{A}, \mathbf{B}
matrices instead.
For reproducibility, the initializations of the model matrices (always initialized as '~ Normal(0, 1)') can be controlled through 'set.seed', but if using parallelizations, there are potential sources of irreproducibility of random seeds due to parallelized aggregations and/or BLAS function calls, which is especially problematic for the LBFGS method with ‘parallelize=’single''.
In order to further avoid potential decimal differences in the factors obtained when fitting the model and when calling the prediction functions on new data, when the data is sparse, it's necessary to sort it beforehand by columns/items and also pass the data data with item indices sorted beforehand to the prediction functions. The package does not perform any indices sorting or deduplication of entries of sparse matrices.
Returns a model object (class named just like the function that produced it, plus general class 'cmfrec') on which methods such as topN and factors can be called. The returned object will have the following fields:
'info': will contain the hyperparameters, problem dimensions, and other information such as the number of threads, as passed to the function that produced the model. The number of threads ('nthreads') might be modified afterthefact. If 'X' is a 'data.frame', will also contain the reindexing of users and items under 'user_mapping' and 'item_mapping', respectively. For the LBFGS method, will also contain the number of function evaluations ('nfev') and number of updates ('nupd') that were performed.
'matrices': will contain the fitted model matrices (see section 'Description' for the naming and for details on what they represent), but note that they will be transposed (due to R's columnmajor representation of matrices) and it is recommended to use the package's prediction functionality instead of taking the matrices directly.
'precomputed': will contain some precomputed calculations based on the model matrices which might help speed up predictions on new data.
Metrics for implicitfeedback recommendations or model quality can be calculated using the recometrics package.
It is recommended to have the RhpcBLASctl package installed for better performance  if available, will be used to control the number of internal BLAS threads before entering a multithreaded region, in order to avoid oversubscription of threads. This can become an issue when using OpenBLAS if it is the 'pthreads' variant.
This package relies heavily on BLAS and LAPACK functions. For better performance, it is recommended to use an optimized backed for them, such as MKL or OpenBLAS.
In Windows, the easiest way of getting MKL is to use Microsoft's MRAN distribution of R, while OpenBLAS can be obtained by following this tutorial (no new R installation required).
In Linux, these can be installed through the system's package manager. In Debian and Debianbased distributions such as Ubuntu, the default BLAS and LAPACK can be configured through the alternatives system (see the Debian docs or this post for MKL).
By default, in a regular x8664 CPU, R will compile all packages with generic options 'msse2' and 'O2', which misses lots of performance optimizations, and in particular, 'cmfrec' will not be able to achieve its maximum performance with them.
It is recommended to use compilation options 'O3', 'march=native', 'fnomatherrno', 'fnotrappingmath', and 'std=c99' or 'std=gnu99'. These can be activated in multiple ways:
(On Linux) Creating an empty text file '~/.R/Makevars' and adding this line there: 'CFLAGS += O3 march=native fnomatherrno fnotrappingmath' (plus an empty line at the end), then installing the usual way with 'install.packages("cmfrec")'.
Installing 'cmfrec' from source, but modifying the 'Makevars' file (it has lines that can be uncommented in order to enable these optimizations).
Modifying the global 'Makeconf' variable. This is a file which defines the default compilation options for all R packages, so be careful about it. In Debian, this file will typically be under '/etc/R/', but this can vary in other operating systems. In this file, replace all occurences of 'O2' with 'O3', and all occurrences of 'msse2' with 'march=native fnomatherrno fnotrappingmath' (e.g. open them in some text editor or in RStudio and use the 'Replace All' functionality) (not recommended to edit this global file, it should be preferred to edit the local user Makevars instead).
Cortes, David. "Coldstart recommendations in Collective Matrix Factorization." arXiv preprint arXiv:1809.00366 (2018).
Singh, Ajit P., and Geoffrey J. Gordon. "Relational learning via collective matrix factorization." Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 2008.
Hu, Yifan, Yehuda Koren, and Chris Volinsky. "Collaborative filtering for implicit feedback datasets." 2008 Eighth IEEE International Conference on Data Mining. Ieee, 2008.
Takacs, Gabor, Istvan Pilaszy, and Domonkos Tikk. "Applications of the conjugate gradient method for implicit feedback collaborative filtering." Proceedings of the fifth ACM conference on Recommender systems. 2011.
Rendle, Steffen, Li Zhang, and Yehuda Koren. "On the difficulty of evaluating baselines: A study on recommender systems." arXiv preprint arXiv:1905.01395 (2019).
Franc, Vojtech, Vaclav Hlavac, and Mirko Navara. "Sequential coordinatewise algorithm for the nonnegative least squares problem." International Conference on Computer Analysis of Images and Patterns. Springer, Berlin, Heidelberg, 2005.
Zhou, Yunhong, et al. "Largescale parallel collaborative filtering for the netflix prize." International conference on algorithmic applications in management. Springer, Berlin, Heidelberg, 2008.
### See the package vignette for an extended version of this example
library(cmfrec)
if (require("recommenderlab") && require("MatrixExtra")) {
### Load the ML100K dataset (movie ratings)
### (users are rows, items are columns)
data("MovieLense")
X < as.coo.matrix(MovieLense@data)
### Will add basic side information about the users
U < MovieLenseUser
U$id < NULL
U$zipcode < NULL
U < model.matrix(~.1, data=U)
### Will additionally use the item genres as side info
I < MovieLenseMeta
I$title < NULL
I$year < NULL
I$url < NULL
I < as.coo.matrix(I)
### Fit a factorization model
### (it's recommended to change the hyperparameters
### and use multiple threads)
model < CMF(X=X, U=U, I=I, k=10L, niter=5L,
NA_as_zero_item=TRUE,
verbose=FALSE, nthreads=1L)
### Predict rating for entries X[1,3], X[2,5], X[10,9]
### (first ID is the user, second is the movie)
predict(model, user=c(1,2,10), item=c(3,5,9))
### Recommend top5 for user ID = 10
### (Note that 'MatrixExtra' makes this return a 'sparseVector')
seen_by_user < MovieLense@data[10, , drop=TRUE]@i
rec < topN(model, user=10, n=5, exclude=seen_by_user)
rec
### Print them in a more understandable format
movie_names < colnames(X)
n_ratings < colSums(as.csc.matrix(X, binary=TRUE))
avg_ratings < colSums(as.csc.matrix(X)) / n_ratings
print_recommended < function(rec, txt) {
cat(txt, ":\n",
paste(paste(1:length(rec), ". ", sep=""),
movie_names[rec],
"  Avg rating:", round(avg_ratings[rec], 2),
", #ratings: ", n_ratings[rec],
collapse="\n", sep=""),
"\n", sep="")
}
print_recommended(rec, "Recommended for user_id=10")
### Recommend assuming it is a new user,
### based on its data (ratings + side info)
x_user < X[10, , drop=TRUE] ## < this is a 'sparseVector'
u_user < U[10, ]
rec_new < topN_new(model, n=5, X=x_user, U=u_user, exclude=seen_by_user)
cat("lists are identical: ", identical(rec_new, rec), "\n")
### Recommend based on side information alone
### (a.k.a. coldstart recommendation)
rec_cold < topN_new(model, n=5, U=u_user)
print_recommended(rec_cold, "Recommended based on side info")
### Obtain factors for the user
factors_user < model$matrices$A[, 10, drop=TRUE]
### Recalculate them based on the data
factors_new < factors_single(model, X=x_user, U=u_user)
### Should be very close, but due to numerical precision,
### might not be exactly equal (see section 'Details')
cat("diff: ", factors_user  factors_new, "\n")
### Can also calculate them in batch
### (slicing is provided by package "MatrixExtra")
Xslice < as.csr.matrix(X)[1:10, , drop=FALSE]
Uslice < U[1:10, , drop=FALSE]
factors_multiple < factors(model, X=Xslice, U=Uslice)
cat("diff: ", factors_multiple[10, , drop=TRUE]  factors_new, "\n")
### Can make coldstart predictions, e.g.
### predict how would users [1,2,3] rate a new item,
### given it's side information (here it's item ID = 5)
predict_new_items(model, user=c(1,2,3), item=c(1,1,1), I=I[5, ])
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.