cosDist: Cosine-Based Analysis

View source: R/cosine.R

cosDist,formUserDatR Documentation

Cosine-Based Analysis

Description

Similarity-based analysis via inner products of user ratings, and possibly other variables.

Usage

formUserData(ratingsIn, usrCovs = NULL, itmCats = NULL, fileOut = "") 
cosDist(x, y, wtcovs, wtcats)
predict.usrData(origData, newData, newItem, k, wtcovs = NULL, wtcats = NULL)

Arguments

ratingsIn

Input data frame, training set. Within-row format is (UserID, ItemID, rating).

usrCovs

Data frame of user covariates, e.g. gender and age, one row per user. User i must be in row i.

itmCats

Data frame of item categories, e.g. movie genre, one row per item

x

Object of class usrDatum.

y

Object of class usrDatum.

wtcovs

Weight to be placed on covariates, relative to ratings variables. Must be positive if have covariates.

wtcats

Weight to be placed on categories, relative to ratings variables.

origData

Object of class 'usrData', serving as the training set.

newData

Object of class 'usrDatum', to be predicted.

newItem

The item ID of the rating to be predicted.

k

Number of nearest neighbors.

fileOut

A file name.

Details

The function formUserData inputs the usual (user ID,item ID,rating) data, and outputs an R list, of class usrData that has one element per user ID. That element, of class usrDatum, has the following components:

  • userID: User ID.

  • itms: Vector of IDs for items rated by this user.

  • ratings: Vector of ratings for those items.

  • usrCovs: Vector of values of covariates, e.g. gender and age, for this user.

  • itmCats: Vector of proportions for the item categories (need not sum to 1) for this user. The j-th one is the proportion of items rating by this user in item category j.

There is no training code; to perform prediction, the only preparation is calling formUserData, which produces a kind of "training set" of class usrData for input into the predict method predict.usrData. The latter predicts (at present) a single new case at a time, based on the data "nearest" the new case, as follows.

In cosDist, the "distance" (not actually a mathematical metric) between numeric vectors u and v is defined to be (u,v) / sqrt((u,u) (v,v)), where ( , ) means inner product. The function cosDist finds this for two objects of class 'usrDatum', with the inner product being taken on the ratings contained in each of these objects, as well as the covariates and category data if any.

Author(s)

Norm Matloff and Vishal Chakraborty

Examples

ivl <- InstEval 
ivl$s <- as.numeric(ivl$s) 
ivl$d <- as.numeric(ivl$d) 
ivl <- ivl[,c(1,2,7)] 
ivl10 <- ivl[1:10,] 
ivl10ud <- formUserData(ivl10)
ivl10ud[[1]]
# output of last is
# $userID  
# [1] 1  
#   
# $itms  
# [1]  525  560  832 1068  
#   
# $ratings  
# [1] 5 2 5 3  
#   
# attr(,"class")  
# [1] "usrDatum"  

matloff/rectools documentation built on March 31, 2022, 12:09 p.m.