Description Usage Arguments Details Value References Examples
Collaborative Filtering for Implicit Feedback Datasets
1 2 3 |
R |
A sparse implicit feedback matrix, where the rows typically represent users and the columns typically represent items. The elements of the matrix represent the number of times that the users have interacted with the items |
alpha |
Used to calculate cost matrix |
C1 |
Equal the cost matrix ( |
P |
A binary matrix, indicating whether or not the users interacted with the items |
f |
The rank of the matrix factorization |
lambda |
The L2 squared norm penalty on the latent row and column features |
init_stdv |
Standard deviation to initialize the latent row and column features |
max_iters |
How many iterations to run the algorithm for |
parallel |
Whether to use |
quiet |
Whether or not to print out progress |
This function impliments the algorithm of Hu et al. (2008) in R using sparse matrices.
It solves for X
and Y
by minimizing the loss function:
∑_{u, i} c_{ui} (p_{ui} - x_u^Ty_i)^2 + λ (||X||_F^2 + ||Y||_F^2)
It does this by iteratively solving for x_u, u = 1, ...,nrow(R)
and
y_i, i = 1, ...,ncol(R)
, holding everything else constant.
Since implicit feedback data is typically sparse, the algorithm and this code are optimized take advantage of the sparsity. That being said, the algorithm involves looping over the rows and columns of the matrix, which R is slow at.
To curtail this, I have implemented
a parallel option using the foreach
package. It speeds up calculations when there
are a decent number of rows or columns (e.g. > 100).
This algorithm also should not have any memory issues because the only inversion is
of an f
dimensional matrix and sparse matrices are used throughout.
An S3 object of class implicitcf
which is a list with the following components:
X |
the rank- |
Y |
the rank- |
loss_trace |
the loss function after each iteration. It should be non-increasing |
f |
the rank used |
lambda |
the penalty parameter used |
Hu, Y., Koren, Y., Volinsky, C., 2008. Collaborative filtering for implicit feedback datasets. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on (pp. 263-272). IEEE.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | rows <- 20
cols <- 10
X <- matrix(rnorm(rows * 2, 0, 1), rows, 2)
Y <- matrix(rnorm(cols * 2, 0, 2), cols, 2)
noise <- matrix(rnorm(rows * cols, 0, 0.5), rows, cols)
R <- round(pmax(tcrossprod(X, Y) + noise, 0))
icf <- implicitcf(R, f = 2, alpha = 1, lambda = 0.1, quiet = FALSE)
# should be decreasing
plot(icf$loss_trace)
## Not run:
# to use parallel on Mac/Linux
library(doMC)
registerDoMC(cores <- parallel::detectCores())
icf <- implicitcf(R, f = 2, alpha = 1, lambda = 0.1, quiet = FALSE, parallel = TRUE)
# to use parallel on Windows
library(doParallel)
cl <- makeCluster(parallel::detectCores())
registerDoParallel(cl)
icf <- implicitcf(R, f = 2, alpha = 1, lambda = 0.1, quiet = FALSE, parallel = TRUE)
stopCluster(cl)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.