cosDist,formUserDat | R Documentation |
Similarity-based analysis via inner products of user ratings, and possibly other variables.
formUserData(ratingsIn, usrCovs = NULL, itmCats = NULL, fileOut = "") cosDist(x, y, wtcovs, wtcats) predict.usrData(origData, newData, newItem, k, wtcovs = NULL, wtcats = NULL)
ratingsIn |
Input data frame, training set. Within-row format is (UserID, ItemID, rating). |
usrCovs |
Data frame of user covariates, e.g. gender and age, one row per user. User i must be in row i. |
itmCats |
Data frame of item categories, e.g. movie genre, one row per item |
x |
Object of class |
y |
Object of class |
wtcovs |
Weight to be placed on covariates, relative to ratings variables. Must be positive if have covariates. |
wtcats |
Weight to be placed on categories, relative to ratings variables. |
origData |
Object of class |
newData |
Object of class |
newItem |
The item ID of the rating to be predicted. |
k |
Number of nearest neighbors. |
fileOut |
A file name. |
The function formUserData
inputs the usual (user ID,item ID,rating)
data, and outputs an R list, of class usrData
that has one
element per user ID. That element, of class usrDatum
, has the
following components:
userID:
User ID.
itms:
Vector of IDs for items rated by this user.
ratings:
Vector of ratings for those items.
usrCovs:
Vector of values of covariates, e.g. gender and
age, for this user.
itmCats:
Vector of proportions for the item categories
(need not sum to 1) for this user. The j-th one is the proportion
of items rating by this user in item category j.
There is no training code; to perform prediction, the only preparation
is calling formUserData
, which produces a kind of "training set"
of class usrData
for input into the predict
method
predict.usrData
. The latter predicts (at present) a single new
case at a time, based on the data "nearest" the new case, as follows.
In cosDist
, the "distance" (not actually a mathematical metric)
between numeric vectors u and v is defined to be (u,v) / sqrt((u,u)
(v,v)), where ( , ) means inner product. The function cosDist
finds this for two objects of class 'usrDatum'
, with the inner
product being taken on the ratings contained in each of these objects,
as well as the covariates and category data if any.
Norm Matloff and Vishal Chakraborty
ivl <- InstEval ivl$s <- as.numeric(ivl$s) ivl$d <- as.numeric(ivl$d) ivl <- ivl[,c(1,2,7)] ivl10 <- ivl[1:10,] ivl10ud <- formUserData(ivl10) ivl10ud[[1]] # output of last is # $userID # [1] 1 # # $itms # [1] 525 560 832 1068 # # $ratings # [1] 5 2 5 3 # # attr(,"class") # [1] "usrDatum"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.