findYdots: Covariate-Based, Latent-Factor Recommender Systems

Tools to incorporate user and item information into latent-factor recommender system methodology, and to add parallel computation capability. Various plots can be displayed.





Input data frame. Within-row format is UserID, ItemID, Rating and optional covariates.


Column number in ratingsIn at which the user-specific covariates begin.


Column number in ratingsIn at which the item-specific covariates begin.


An R parallel cluster.


If a prediction is to be made on a user with fewer than this number of ratings and if there are covariates, predict from the covariates.


An object of class 'ydotsMM' or 'ydotsMLE'.


A data frame consisting of cases to be predicted. Format is the same as ratingsIn, except that there is no ratings column.


Note: This software assumes that user and item ID number are consecutive, starting with 1.

All functions here implement latent-factor models for recommender systems. They add the capability of using covariates, e.g. age and gender, and in some cases enable parallel computation. MLE and Method of Moments approaches are offered.

The basic model without covariates, is

mean rating = overall mean + user effect + item effect

Adding covariates, this becomes

mean rating = linear covariates effect + user effect + item effect

This is the model we use in the MLE case. But for the Method of Moments, we essentially have an identifiability problem: The estimated effect for user i would be the sample mean of all his/her ratings, minus the overall sample mean of all ratings, minus the linear covariates effect. The latter would then get added right back in to the sum, resulting in no impact of the covariates. (It is not a direct issue with MLE, due to the additional assumption that the user effects are normally distributed.) So, in the MM case, our model is

mean rating = linear covariates effect + item effect

So, by offering both MLE and MM approaches, the package not only allows two different estimation methods, but also two different models. The MLE structural model is more general than that of MM, but has more restrictive distributional assumptions.

The covariates are assumed to begin in column 4 of ratingsIn, with the user-related ones, if any, coming first, and then the item-related ones, if any.

The functions trainMM and trainMLE work on a training set, returning objects that later can be used to predict new cases. The former is much faster than the latter and has a smaller memory footprint, though both shortcomings of MLE are ameliorated to some extent via parallel computation.

The trainMLE function is primarily a wrapper that sets up Maximum Likelihood Estimation (assuming normal user and item effects) for a crossed-effects model in the lme4 function lmer. As the computation for that function can be lengthy and memory-intensive, trainMLE also enables parallelizing the computation.

Plotting: Calling plot(ydotsObj,ratingsIn) invokes plot.ydotsMM. Several plots are displayed, including density estimates for the user and item random effects, and a smoothed scatter plot for the joint density of those effects.


The functions trainMM and trainMLE return objects of class 'ydotsMM' and 'ydotsMLE', respectively.

The functions predict.ydotsMM and predict.ydotsMLE return a vector of predicted ratings.


Norm Matloff and Pooja Rajkumar


# toy example
rts <- rbind(c(1,3,1),c(4,2,2),c(4,3,1),c(1,2,4))
rts <- data.frame(rts)
ydots <- trainMM(rts)
ydots  # e.g. usrMeans['4'] = 1.5
predict(ydots,rbind(c(1,2),c(4,3),c(4,2)))  # 3.5, 0.5, 2.5 
# instructor evaluation data
# run the training data, no covariates
ydout <- trainMLE(ivl[,1:3]) 
# form a test set to illustrate prediction
testSet <- ivl[c(3,8),]
# say want to predict how well students 1 and 3 would like instructor 12
testSet[1,2] <- 12
testSet[2,2] <- 12
# predict
predict(ydout,testSet[,1:2])  # 4.272660 4.410612
# MM without covariates
ydout <- trainMM(ivl[,1:3])
predict(ydout,testSet[,-3])  # 5.141009 5.137111 
# try using the covariates
ydout <- trainMM(ivl,userCovsStartCol=4,itemCovsStartCol=5)
predict(ydout,testSet[,-3],minN=5)  # 5.141009 5.137111 

