This vignette describes how to use cMDS to produce interactive plots.
In this example we'll use data from a personality study, "bfi" from package psych. People answered questions such as "I am full of ideas" with a rating of how well that describes their personality (from 1 to 6).
We load the package:
if (suppressWarnings(require(psych))==FALSE) { install.packages("psych") } library(psych) data(bfi)
The questions asked are coded in the data.frame as "A1","A2", etc. The help file for the dataset gives us the corresponding questions, which we encode as a list :
source('personality_questions.R') questions[["A1"]]
The data include the answers of around 3,000 participants. Participants' responses to a certain question can be grouped into a (long) response vector. If two questions measure a similar aspect of personality, then the responses will tend to correlate: presumably, if you're cocky enough to declare that you "know how to captivate people", you'll probably also declare that you are "make friends easily". To start with, we'll measure the similarity of two response vectors using simple Euclidean distance.
First we get rid of the rows that contain NAs
nas <- apply(bfi,1,function(v) any(is.na(v))) sub <- bfi[!nas,]
then we compute a distance matrix between all the response vectors:
X <- t(sub[,1:25]) #Questions only, no covariates D <- as.matrix(dist(X)) image(t(D))
We can cluster the response vectors:
plot(hclust(dist(X)))
the highest-level cluster is pretty straightforward to interpret:
head(questions[which(cutree(hclust(dist(X)),2)==1)],3)
vs:
head(questions[which(cutree(hclust(dist(X)),2)==2)],3)
This seems to suggest there are questions that people prefer to answer positively and others not.
The bfi dataset also comes with a column vector that gives us the age of the participants.
hist(bfi$age)
We'll use cMDS to see if the relationship between questions changes over time.
We define a function that gives a distance matrix between questions specific to a certain age group:
dmat <- function(age.med) { sub.age <- subset(sub,abs(age-age.med)<=2) X <- sub.age[,1:25] as.matrix(dist(t(X)))/sqrt(nrow(X)) }
we collect distance matrices over a succession of age groups:
ages <- seq(16,60,l=30) DL <- llply(ages,dmat) #Will output a list of distance matrices names(DL) <- round(ages) #This will be used later by the plotting function
and run cMDS to get a 2d embedding:
library(cmdsr) res <- cmds(DL,k=2,l=1)
The most important bit in the output returned by cMDS is XL, a list of configurations with the same length as DL, the list of distances. Each XL is a k*n matrix, where k is the embedding dimension (here, k=2) and n is the number of points.
plot(res$XL[[1]][1,],res$XL[[1]][2,],xlab="Dimension 1",ylab="Dimension 2", pch=19,main="cMDS configuration at the initial timestep")
Each point represents a question, located so that its distance to the other points reflects the initial distance matrix. We have what looks like clustering, and sure enough if we label the points according to the roughest hierarchical clustering division we have:
cols <- c('red','blue')[cutree(hclust(dist(X)),2)] plot(res$XL[[1]][1,],res$XL[[1]][2,],xlab="Dimension 1",ylab="Dimension 2", pch=19,main="cMDS configuration at the initial timestep",col=cols)
The embedding can be displayed using the motion widget in the GoogleVis package:
#Create a data.frame with some basic information on what's being displayed df <- data.frame(id=names(questions),question=do.call("c",questions)) print(googleVis.cmds(res,df),"chart")
Due to a limitation in GoogleVis, the ages appear as years: 1920 actually means people around 20.
Interactive graphs can also be produced using other libraries. Here's an example using the https://github.com/tdhock/animint|animint package:
library(animint) library(stringr) df$group <- str_sub(df$id,1,1) #Questions are grouped thematically, we'll plot themes using different colours #Turn cMDS output into a data.frame embed <- ldply(1:length(ages),function(ind) { df <- as.data.frame(t(res$XL[[ind]])) names(df) <- c("cmds.x1","cmds.x2") df[,"age"] <- ages[ind] df[,"id"] <- names(questions) df }) embed <- merge(embed,df) #Add info on the questions #animint is based on ggplot. see animint documentation for info pa <- ggplot(df,aes(rep(0,25),25:1,col=group))+geom_text(aes(label=question,clickSelects=id)) + theme(axis.line=element_blank(), axis.text=element_blank(), axis.ticks=element_blank(), axis.title=element_blank(),legend.position="none") pb <- ggplot(embed)+make_text(embed,0,2,label="age",format="Age %d +/- 2 years")+geom_text(aes(cmds.x1,cmds.x2,label=id,showSelected=id,showSelected2=age),size=2)+geom_text(aes(cmds.x1,cmds.x2,label=id,showSelected=age,col=group),alpha=.8) gg2animint_knitr(list(pa=pa,pb=pb,time=list(variable="age", ms=500)))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.