poismf: Factorization of Sparse Counts Matrices through Poisson...

Description Usage Arguments Details Value Fields References See Also Examples

View source: R/poismf.R

Description

Creates a low-rank non-negative factorization of a sparse counts matrix by maximizing Poisson likelihood minus L1/L2 regularization, using gradient-based optimization procedures.

The model idea is: X ~ Poisson(A*t(B))

Ideal for usage in recommender systems, in which the 'X' matrix would consist of interactions (e.g. clicks, views, plays), with users representing the rows and items representing the columns.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
poismf(
  X,
  k = 50,
  method = "tncg",
  l2_reg = "auto",
  l1_reg = 0,
  niter = "auto",
  maxupd = "auto",
  limit_step = TRUE,
  initial_step = 1e-07,
  weight_mult = 1,
  init_type = "gamma",
  seed = 1,
  handle_interrupt = TRUE,
  nthreads = parallel::detectCores()
)

Arguments

X

The counts matrix to factorize. Can be:

  • A 'data.frame' with 3 columns, containing in this order: row index or user ID, column index or item ID, count value. The first two columns will be converted to factors to enumerate them internally, and will return those same values from 'topN'. In order to avoid this internal re-enumeration, can pass 'X' as a sparse COO matrix instead.

  • A sparse matrix from package 'Matrix' in triplets (COO) format (that is: 'Matrix::dgTMatrix') (recommended). Such a matrix can be created from row/column indices through function 'Matrix::sparseMatrix' (with 'repr="T"'). Will also accept them in CSC format ('Matrix::dgCMatrix'), but will be converted along the way (so it will be slightly slower).

  • A sparse matrix in COO format from the 'SparseM' package. Such a matrix can be created from row/column indices through 'new("matrix.coo", ra=values, ja=col_ix, ia=row_ix, dim=as.integer(c(m,n)))'. Will also accept them in CSR and CSC format, but will be converted along the way (so it will be slightly slower).

  • A full matrix (of class 'base::matrix') - this is not recommended though.

Passing sparse matrices is faster as it will not need to re-enumerate the rows and columns, Full matrices will be converted to sparse.

k

Number of latent factors to use (dimensionality of the low-rank factorization). If ‘k' is small (e.g. 'k=5'), it’s recommended to use ‘method=’pg''. For large values, (e.g. ‘k=100'), it’s recommended to use ‘method=’tncg''. For medium values (e.g. ‘k=50'), it’s recommende to use either ‘method=’tncg'‘ or 'method=’cg''.

method

Optimization method to use. Options are:

  • '"tncg"' : will use a truncated Newton-CG method. This is the slowest option, but tends to find better local optima, and if run for many iterations, tends to produce sparse latent factor matrices.

  • '"cg"' : will use a Conjugate Gradient method, which is faster than the truncated Newton-CG, but tends not to reach the best local optima. Usually, with this method and the default hyperparameters, the latent factor matrices will not be very sparse.

  • '"pg"' : will use a proximal gradient method, which is a lot faster than the other two and more memory-efficient, but tends to only work with very large regularization values, and doesn't find as good local optima, nor tends to result in sparse factors.

l2_reg

Strength of L2 regularization. It is recommended to use small values along with ‘method=’tncg'‘, very large values along with 'method=’pg'', and medium to large values with ‘method=’cg''. If passing '"auto"', will set it to '10^3' for TNCG, '10^5' for CG, and '10^9' for PG.

l1_reg

Strength of L1 regularization. Not recommended.

niter

Number of alternating iterations to perform. One iteration denotes an update over both matrices. If passing ''auto'', will set it to 50 for TNCG, 25 for CG, and 10 for PG.

maxupd

Maximum number of updates to each user/item vector within an iteration. Note: for 'method=TNCG', this means maximum number of function evaluations rather than number of updates, so it should be higher. You might also want to try decreasing this while increasing 'niter'. For ‘method=’pg'', this will be taken as the actual number of updates, as it does not perform a line search like the other methods. If passing ‘"auto"', will set it to '5*k' for 'method=’tncg'', 25 for ‘method=’cg'‘, and 10 for 'method=’pg''. If using ‘method=’cg'', you might also try instead setting 'maxupd=1' and 'niter=100'.

limit_step

When passing ‘method=’cg'', whether to limit the step sizes in each update so as to drive at most one variable to zero each time, as prescribed in [2]. If running the procedure for many iterations, it's recommended to set this to 'True'. You also might set ‘method=’cg'' plus 'maxupd=1' and 'limit_step=FALSE' to reduce the algorithm to simple gradient descent with a line search.

initial_step

Initial step size to use for proximal gradient updates. Larger step sizes reach converge faster, but are more likely to result in failed optimization. Ignored when passing ‘method=’tncg'‘ or 'method=’cg'', as those will perform a line seach instead.

weight_mult

Extra multiplier for the weight of the positive entries over the missing entries in the matrix to factorize. Be aware that Poisson likelihood will implicitly put more weight on the non-missing entries already. Passing larger values will make the factors have larger values (which might be desirable), and can help with instability and failed optimization cases. If passing this, it's recommended to try very large values (e.g. 10^2), and might require adjusting the other hyperparameters.

init_type

How to initialize the model parameters. One of ''gamma'' (will initialize them ‘~ Gamma(1, 1))' or '’unif'' (will initialize them '~ Unif(0, 1))'..

seed

Random seed to use for starting the factorizing matrices.

handle_interrupt

When receiving an interrupt signal, whether the model should stop early and leave a usable object with the parameters obtained up to the point when it was interrupted (when passing 'TRUE'), or raise an interrupt exception without producing a fitted model object (when passing 'FALSE').

nthreads

Number of parallel threads to use.

Details

In order to obtain sparse latent factor matrices, you need to pass ‘method=’tncg'' and a large 'niter', such as 'niter=50' or 'niter=100'. The L1 regularization approach is not recommended, even though it might also produce relatively sparse results with the other optimization methods.

When using proximal gradient method, this model is prone to numerical instability, and can turn out to spit all NaNs or zeros in the fitted parameters. The conjugate gradient and Newton-CG methods are not prone to such failed optimizations.

Value

An object of class 'poismf' with the following fields of interest:

Fields

A

The user/document/row-factor matrix (as a vector in row-major order, has to be reshaped to (k, nrows) and then transposed to obtain an R matrix).

B

The item/word/column-factor matrix (as a vector in row-major order, has to be reshaped to (k, ncols) and then transposed to obtain an R matrix).

levels_A

A vector indicating which user/row ID corresponds to each row position in the 'A' matrix. This will only be generated when passing 'X' as a 'data.frame', otherwise will not remap them.

levels_B

A vector indicating which item/column ID corresponds to each row position in the 'B' matrix. This will only be generated when passing 'X' as a 'data.frame', otherwise will not remap them.

References

See Also

predict.poismf topN factors get.factor.matrices get.model.mappings

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
library(poismf)

### create a random sparse data frame in COO format
nrow <- 10^2 ## <- users
ncol <- 10^3 ## <- items
nnz  <- 10^4 ## <- events (agg)
set.seed(1)
X <- data.frame(
        row_ix = sample(nrow, size=nnz, replace=TRUE),
        col_ix = sample(ncol, size=nnz, replace=TRUE),
        count  = rpois(nnz, 1) + 1
     )
X <- X[!duplicated(X[, c("row_ix", "col_ix")]), ]

### can also pass X as sparse matrix - see below
### X <- Matrix::sparseMatrix(
###          i=X$row_ix, j=X$col_ix, x=X$count,
###          repr="T")
### the indices can also be characters or other types:
### X$row_ix <- paste0("user", X$row_ix)
### X$col_ix <- paste0("item", X$col_ix)

### factorize the randomly-generated sparse matrix
### good speed (proximal gradient)
model <- poismf(X, k=5, method="pg", nthreads=1)

### good quality, but slower (conjugate gradient)
model <- poismf(X, k=5, method="cg", nthreads=1)

### better quality, but much slower (truncated Newton-CG)
model <- poismf(X, k=5, method="tncg", nthreads=1)


### for getting sparse factors
model <- poismf(X, k=50, method="tncg")
mean(model$A == 0.)


### predict functionality (chosen entries in X)
### predict entry [1, 10] (row 1, column 10)
predict(model, 1, 10)
### predict entries [1,4], [1,5], [1,6]
predict(model, c(1, 1, 1), c(4, 5, 6))

### ranking functionality (for recommender systems)
topN(model, user=2, n=5, exclude=X$col_ix[X$row_ix==2])
topN.new(model, X=X[X$row_ix==2, c("col_ix","count")],
    n=5, exclude=X$col_ix[X$row_ix==2])

### obtaining latent factors
a_vec  <- factors.single(model,
            X[X$row_ix==2, c("col_ix","count")])
A_full <- factors(model, X)
A_orig <- get.factor.matrices(model)$A

### (note that newly-obtained factors will differ slightly)
sqrt(mean((A_full["2",] - A_orig["2",])^2))

poismf documentation built on Jan. 13, 2021, 6:46 a.m.

Related to poismf in poismf...