sar: Fit a SAR model

Description Usage Arguments Details Value Cold items See Also Examples

View source: R/sar.R

Description

Fit a SAR model

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
sar(...)

## S3 method for class 'data.frame'
sar(
  x,
  user = "user",
  item = "item",
  time = "time",
  event = "event",
  weight = "weight",
  ...
)

## Default S3 method:
sar(
  user,
  item,
  time,
  event = NULL,
  weight = NULL,
  support_threshold = 1,
  allowed_items = NULL,
  allowed_events = c(Click = 1, RecommendationClick = 2, AddShopCart = 3,
    RemoveShopCart = -1, Purchase = 4),
  by_user = TRUE,
  similarity = c("jaccard", "lift", "count"),
  half_life = 30,
  catalog_data = NULL,
  catalog_formula = item ~ .,
  cold_to_cold = FALSE,
  cold_item_model = NULL,
  ...
)

## S3 method for class 'sar'
print(x, ...)

Arguments

...

For sar(), further arguments to pass to the cold-items feature model.

x

A data frame. For the print method, a SAR model object.

user, item, time, event, weight

For the default method, vectors to use as the user IDs, item IDs, timestamps, event types, and transaction weights for SAR. For the data.frame method, the names of the columns in the data frame x to use for these variables.

support_threshold

The SAR support threshold. Items that do not occur at least this many times in the data will be considered "cold".

allowed_items

A character or factor vector of allowed item IDs to use in the SAR model. If supplied, this will be used to categorise the item IDs in the data.

allowed_events

The allowed values for events, if that argument is supplied. Other values will be discarded.

by_user

Should the analysis be by user ID, or by user ID and timestamp? Defaults to userID only.

similarity

Similarity metric to use; defaults to Jaccard.

half_life

The decay period to use when weighting transactions by age.

catalog_data

A dataset to use for building the cold-items feature model.

catalog_formula

A formula for the feature model used to compute similarities for cold items.

cold_to_cold

Whether the cold-items feature model should include the cold items themselves in the training data, or only warm items.

cold_item_model

The type of model to use for cold item features.

Details

Smart Adaptive Recommendations (SAR) is a fast, scalable, adaptive algorithm for personalized recommendations based on user transaction history and item descriptions. It produces easily explainable/interpretable recommendations and handles "cold item" and "semi-cold user" scenarios.

Central to how SAR works is an item-to-item co-occurrence matrix, which is based on how many times two items occur for the same users. For example, if a given user buys items i_1 and i_2, then the cell (i_1, i_2) is incremented by 1. From this, an item similarity matrix can be obtained by rescaling the co-occurrences according to a given metric. Options for the metric include Jaccard (the default), lift, and counts (which means no rescaling).

Note that the similarity matrix in SAR thus only includes information on which users transacted which items. It does not include any other information such as item ratings or features, which may be used by other recommender algorithms.

#' The SAR implementation in R should be usable on datasets with up to a few million rows and several thousand items. The main constraint is the size of the similarity matrix, which in turn depends (quadratically) on the number of unique items. The implementation has been successfully tested on the MovieLens 20M dataset, which contains about 138,000 users and 27,000 items. For larger datasets, it is recommended to use the Azure web service API.

Value

An S3 object representing the SAR model. This is essentially the item-to-item similarity matrix in sparse format, along with the original transaction data used to fit the model.

Cold items

SAR has the ability to handle cold items, meaning those which have not been seen by any user, or which have only been seen by a number of users less than support_threshold. This is done by using item features to predict similarities. The method used for this is set by the cold_items_model argument:

The data frame and features used for cold items are given by the catalog_data and catalog_formula arguments. catalog_data should be a data frame whose first column is item ID. catalog_formula should be a one-sided formula (no LHS).

This feature is currently experimental, and subject to change.

See Also

Description of SAR at the Product Recommendations API repo on GitHub

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
data(ms_usage)

## all of these fit the same model:

# fit a SAR model from a series of vectors
mod1 <- sar(user=ms_usage$user, item=ms_usage$item, time=ms_usage$time)

# fit a model from a data frame, naming the variables to use
mod2 <- sar(ms_usage, user="user", item="item", time="time")

# fit a model from a data frame, using default variable names
mod3 <- sar(ms_usage)

SAR documentation built on Oct. 23, 2020, 7:55 p.m.

Related to sar in SAR...