implicitcf: Collaborative Filtering for Implicit Feedback Datasets
In andland/implicitcf: Collaborative Filtering for Implicit Feedback Datasets

Description Usage Arguments Details Value References Examples

Collaborative Filtering for Implicit Feedback Datasets

1
2
3

implicitcf(R, alpha = 1, C1 = alpha * R, P = (R > 0) * 1, f = 10,
  lambda = 0, init_stdv = ifelse(lambda == 0, 0.01, 1/sqrt(2 * lambda)),
  max_iters = 10, parallel = FALSE, quiet = TRUE)

`R`	A sparse implicit feedback matrix, where the rows typically represent users and the columns typically represent items. The elements of the matrix represent the number of times that the users have interacted with the items
`alpha`	Used to calculate cost matrix `C` = 1 + `alpha` * `R` if `C1` is not specified
`C1`	Equal the cost matrix (`C`) minus 1, which should be sparse
`P`	A binary matrix, indicating whether or not the users interacted with the items
`f`	The rank of the matrix factorization
`lambda`	The L2 squared norm penalty on the latent row and column features
`init_stdv`	Standard deviation to initialize the latent row and column features
`max_iters`	How many iterations to run the algorithm for
`parallel`	Whether to use `foreach` package to parallelize the computation. See the example for how to use.
`quiet`	Whether or not to print out progress

This function impliments the algorithm of Hu et al. (2008) in R using sparse matrices. It solves for X and Y by minimizing the loss function:

∑_{u, i} c_{ui} (p_{ui} - x_u^Ty_i)^2 + λ (||X||_F^2 + ||Y||_F^2)

It does this by iteratively solving for x_u, u = 1, ...,nrow(R) and y_i, i = 1, ...,ncol(R), holding everything else constant.

Since implicit feedback data is typically sparse, the algorithm and this code are optimized take advantage of the sparsity. That being said, the algorithm involves looping over the rows and columns of the matrix, which R is slow at.

To curtail this, I have implemented a parallel option using the foreach package. It speeds up calculations when there are a decent number of rows or columns (e.g. > 100).

This algorithm also should not have any memory issues because the only inversion is of an f dimensional matrix and sparse matrices are used throughout.

An S3 object of class implicitcf which is a list with the following components:

`X`	the rank-`f` latent features for the users
`Y`	the rank-`f` latent features for the items
`loss_trace`	the loss function after each iteration. It should be non-increasing
`f`	the rank used
`lambda`	the penalty parameter used

Hu, Y., Koren, Y., Volinsky, C., 2008. Collaborative filtering for implicit feedback datasets. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on (pp. 263-272). IEEE.

 rows <- 20
 cols <- 10
 X <- matrix(rnorm(rows * 2, 0, 1), rows, 2)
 Y <- matrix(rnorm(cols * 2, 0, 2), cols, 2)
 noise <- matrix(rnorm(rows * cols, 0, 0.5), rows, cols)
 R <- round(pmax(tcrossprod(X, Y) + noise, 0))

 icf <- implicitcf(R, f = 2, alpha = 1, lambda = 0.1, quiet = FALSE)

 # should be decreasing
 plot(icf$loss_trace)

 ## Not run: 
 # to use parallel on Mac/Linux
 library(doMC)
 registerDoMC(cores <- parallel::detectCores())
 icf <- implicitcf(R, f = 2, alpha = 1, lambda = 0.1, quiet = FALSE, parallel = TRUE)

 # to use parallel on Windows
 library(doParallel)
 cl <- makeCluster(parallel::detectCores())
 registerDoParallel(cl)
 icf <- implicitcf(R, f = 2, alpha = 1, lambda = 0.1, quiet = FALSE, parallel = TRUE)
 stopCluster(cl)
 
## End(Not run)