implicitcf: Collaborative Filtering for Implicit Feedback Datasets

Description Usage Arguments Details Value References Examples

Description

Collaborative Filtering for Implicit Feedback Datasets

Usage

1
2
3
implicitcf(R, alpha = 1, C1 = alpha * R, P = (R > 0) * 1, f = 10,
  lambda = 0, init_stdv = ifelse(lambda == 0, 0.01, 1/sqrt(2 * lambda)),
  max_iters = 10, parallel = FALSE, quiet = TRUE)

Arguments

R

A sparse implicit feedback matrix, where the rows typically represent users and the columns typically represent items. The elements of the matrix represent the number of times that the users have interacted with the items

alpha

Used to calculate cost matrix C = 1 + alpha * R if C1 is not specified

C1

Equal the cost matrix (C) minus 1, which should be sparse

P

A binary matrix, indicating whether or not the users interacted with the items

f

The rank of the matrix factorization

lambda

The L2 squared norm penalty on the latent row and column features

init_stdv

Standard deviation to initialize the latent row and column features

max_iters

How many iterations to run the algorithm for

parallel

Whether to use foreach package to parallelize the computation. See the example for how to use.

quiet

Whether or not to print out progress

Details

This function impliments the algorithm of Hu et al. (2008) in R using sparse matrices. It solves for X and Y by minimizing the loss function:

∑_{u, i} c_{ui} (p_{ui} - x_u^Ty_i)^2 + λ (||X||_F^2 + ||Y||_F^2)

It does this by iteratively solving for x_u, u = 1, ...,nrow(R) and y_i, i = 1, ...,ncol(R), holding everything else constant.

Since implicit feedback data is typically sparse, the algorithm and this code are optimized take advantage of the sparsity. That being said, the algorithm involves looping over the rows and columns of the matrix, which R is slow at.

To curtail this, I have implemented a parallel option using the foreach package. It speeds up calculations when there are a decent number of rows or columns (e.g. > 100).

This algorithm also should not have any memory issues because the only inversion is of an f dimensional matrix and sparse matrices are used throughout.

Value

An S3 object of class implicitcf which is a list with the following components:

X

the rank-f latent features for the users

Y

the rank-f latent features for the items

loss_trace

the loss function after each iteration. It should be non-increasing

f

the rank used

lambda

the penalty parameter used

References

Hu, Y., Koren, Y., Volinsky, C., 2008. Collaborative filtering for implicit feedback datasets. In Data Mining, 2008. ICDM'08. Eighth IEEE International Conference on (pp. 263-272). IEEE.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
 rows <- 20
 cols <- 10
 X <- matrix(rnorm(rows * 2, 0, 1), rows, 2)
 Y <- matrix(rnorm(cols * 2, 0, 2), cols, 2)
 noise <- matrix(rnorm(rows * cols, 0, 0.5), rows, cols)
 R <- round(pmax(tcrossprod(X, Y) + noise, 0))

 icf <- implicitcf(R, f = 2, alpha = 1, lambda = 0.1, quiet = FALSE)

 # should be decreasing
 plot(icf$loss_trace)

 ## Not run: 
 # to use parallel on Mac/Linux
 library(doMC)
 registerDoMC(cores <- parallel::detectCores())
 icf <- implicitcf(R, f = 2, alpha = 1, lambda = 0.1, quiet = FALSE, parallel = TRUE)

 # to use parallel on Windows
 library(doParallel)
 cl <- makeCluster(parallel::detectCores())
 registerDoParallel(cl)
 icf <- implicitcf(R, f = 2, alpha = 1, lambda = 0.1, quiet = FALSE, parallel = TRUE)
 stopCluster(cl)
 
## End(Not run)

andland/implicitcf documentation built on May 10, 2019, 10:29 a.m.