factors: Calculate latent factors on new data

View source: R/factors.R

factorsR Documentation

Calculate latent factors on new data

Description

Determine latent factors for new user(s)/row(s), given either 'X' data (a.k.a. "warm-start"), or 'U' data (a.k.a. "cold-start"), or both.

If passing both types of data ('X' and 'U'), and the number of rows in them differs, will be assumed that the shorter matrix has only missing values for the unmatched entries in the other matrix.

Note: this function will not perform any internal re-indexing for the data. If the 'X' to which the data was fit was a 'data.frame', the numeration of the items will be under 'model$info$item_mapping'. There is also a function factors_single which will let the model do the appropriate reindexing.

For example usage, see the main section fit_models.

Usage

factors(model, ...)

## S3 method for class 'CMF'
factors(
  model,
  X = NULL,
  U = NULL,
  U_bin = NULL,
  weight = NULL,
  output_bias = FALSE,
  nthreads = model$info$nthreads,
  ...
)

## S3 method for class 'CMF_implicit'
factors(model, X = NULL, U = NULL, nthreads = model$info$nthreads, ...)

## S3 method for class 'ContentBased'
factors(model, U, nthreads = model$info$nthreads, ...)

## S3 method for class 'OMF_explicit'
factors(
  model,
  X = NULL,
  U = NULL,
  weight = NULL,
  output_bias = FALSE,
  output_A = FALSE,
  exact = FALSE,
  nthreads = model$info$nthreads,
  ...
)

## S3 method for class 'OMF_implicit'
factors(
  model,
  X = NULL,
  U = NULL,
  output_A = FALSE,
  nthreads = model$info$nthreads,
  ...
)

Arguments

model

A collective matrix factorization model from this package - see fit_models for details.

...

Not used.

X

New 'X' data, with rows denoting new users. Can be passed in the following formats:

  • A sparse COO/triplets matrix, either from package 'Matrix' (class 'dgTMatrix'), or from package 'SparseM' (class 'matrix.coo').

  • A sparse matrix in CSR format, either from package 'Matrix' (class 'dgRMatrix'), or from package 'SparseM' (class 'matrix.csr'). Passing the input as CSR is faster than COO as it will be converted internally.

  • A sparse row vector from package 'Matrix' (class 'dsparseVector').

  • A dense matrix from base R (class 'matrix'), with missing entries set as 'NA'/'NaN'.

  • A dense row vector from base R (class 'numeric'), with missing entries set as 'NA'/'NaN'.

Dense 'X' data is not supported for 'CMF_implicit' or 'OMF_implicit'.

U

New 'U' data, with rows denoting new users. Can be passed in the same formats as 'X', or additionally as a 'data.frame', which will be internally converted to a matrix.

U_bin

New binary columns of 'U'. Must be passed as a dense matrix from base R or as a 'data.frame'.

weight

Associated observation weights for entries in 'X'. If passed, must have the same shape as 'X' - that is, if 'X' is a sparse matrix, should be a numeric vector with length equal to the non-missing elements (or a sparse matrix in the same format, but will not make any checks on the indices), if 'X' is a dense matrix, should also be a dense matrix with the same number of rows and columns.

output_bias

Whether to also return the user bias determined by the model given the data in 'X'.

nthreads

Number of parallel threads to use.

output_A

Whether to return the raw 'A' factors (the free offset).

exact

(In the 'OMF_explicit' model) Whether to calculate 'A' and 'Am' with the regularization applied to 'A' instead of to 'Am' (if using the L-BFGS method, this is how the model was fit). This is usually a slower procedure. Only relevant when passing 'X' data.

Details

Note that, regardless of whether the model was fit with the L-BFGS or ALS method with CG or Cholesky solver, the new factors will be determined through the Cholesky method or through the precomputed matrices (e.g. a simple matrix-matrix multiply for the 'ContentBased' model), unless passing 'U_bin' in which case they will be determined through the same L-BFGS method with which the model was fit.

Value

If passing 'output_bias=FALSE', 'output_A=FALSE', and for the implicit-feedback models, will return a matrix with the obtained latent factors for each row/user given the 'X' and/or 'U' data (number of rows is 'max(nrow(X), nrow(U), nrow(U_bin))'). If passing any of the above options, will return a list with the following elements:

  • 'factors': The obtained latent factors (a matrix).

  • 'bias': (If passing 'output_bias=TRUE') A vector with the obtained biases for each row/user.

  • 'A': (If passing 'output_A=TRUE') The raw 'A' factors matrix (which is added to the factors determined from user attributes in order to obtain the factorization parameters).

See Also

factors_single


cmfrec documentation built on April 11, 2023, 6 p.m.