The Jester joke recommender system

knitr::opts_chunk$set(echo = TRUE)

(this file creates objects jester and maxjest, which are datasets available in the hyper2 package, and documented at jester.Rd. This file takes quite a long time to run).

Goldberg et al present a dataset in which respondents rated a number of jokes. Here, I analyse a small portion of this dataset using the hyper2 package. This document is intended to illustrate an extremely challenging application of the hyper2 package and (without cache) takes a long time to process. Goldberg's dataset has 24938 lines, one per respondent, and 101 columns, one per joke (the first column shows the number of jokes rated by each respondent); here I use 150 lines and 99 jokes (the 100th joke was not funny).

a <- read.csv("jester-data-3.csv",head=FALSE) # File is 150 lines only
a <- as.matrix(a[,-c(1,100)])
a[a==99] <- NA
colnames(a) <- paste("joke",sprintf("%02d",seq_len(ncol(a))),sep="")

Row 1 of a is displayed (most entries are NA, signifying that respondent 1 did not rank that particular joke). It shows that the first respondent rated joke 5 at -1.65, joke 7 at -0.78, and so on. We can perform some summary statistics of a:


The above plot shows how many jokes each of the 100 respondents rated.

plot(colSums(!,xlab="joke index")

The above shows how many respondents rated each joke. It would make sense to remove the jokes that were not rated:

a <- a[,colSums(!>1]

showing that 91 jokes were rated by at least one respondent. We need to transform the dataset:

f <- function(x){
    x <- x[!]
    x[order(x,decreasing=TRUE)] <- seq_along(x)

Thus we see that this respondent rated joke08 to be the funniest, having rank 1.

jester <- hyper2()
system.time(for(i in seq_len(nrow(a))){jester <- jester + rank_likelihood(f(a[i,]))})


system.time(jester_maxp <- maxp(jester,n=1))




Eigentaste: A Constant Time Collaborative Filtering Algorithm. Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. Information Retrieval, 4(2), 133-151. July 2001.

Package dataset

Following lines create jester.rda, residing in the data/ directory of the package.


