Incorporating Set Metadata


For all examples the movies data set contained in the package will be used.

library(UpSetR)
movies <- read.csv( system.file("extdata", "movies.csv", package = "UpSetR"), header=T, sep=";" )


set.metadata Parameter Breakdown

The set.metadata parameter is broken up into 3 fields: data, ncols, and plots.

Example 1: Set Metadata Bar Plot

In this example, the average Rotten Tomatoes movie ratings for each set will be used as the set metadata. This may help us draw more conclusions from the visualization by knowing how professional movie reviewers typically rate movies in these categories.

sets <- names(movies[3:19])
avgRottenTomatoesScore <- round(runif(17, min=0, max = 90))
metadata <- as.data.frame(cbind(sets, avgRottenTomatoesScore))
names(metadata) <- c("sets", "avgRottenTomatoesScore")

When generating a bar plot using set metadata information it is important to make sure the specified column is numeric.

is.numeric(metadata$avgRottenTomatoesScore)

The column is not numeric! In fact it is a factor, so we must coerce it to characters and then to integers.

metadata$avgRottenTomatoesScore <- as.numeric(as.character(metadata$avgRottenTomatoesScore))
upset(movies, set.metadata = list(data = metadata, plots = list(list(type="hist", column="avgRottenTomatoesScore", assign=20))))

Example 2: Set Metadata Heat Map

In this example we will make our own data on what major cities these genres were most popular in. Since this is categorical and not ordinal we must remember to change the column to characters (it is a factor again). To make sure we assign specific colors to each category you can specify the name of each category in the color vector, as shown below. If you don't care what color is assigned to each category then you don't have to specify the category names in the color vector. R will just apply the colors to each category in the order they occur. Additionally, if you don't supply anything for the colors parameter a default color palette will be provided for you.

Cities <- sample(c("Boston","NYC","LA"), 17, replace = T)
metadata <- cbind(metadata, Cities)
metadata$Cities <- as.character(metadata$Cities)
metadata[which(metadata$sets %in% c("Drama", "Comedy", "Action", "Thriller", "Romance")), ]
upset(movies, set.metadata = list(data = metadata, plots = list(list(type = "heat", column = "Cities", assign = 10, colors = c("Boston" = "green", "NYC" = "navy", "LA" = "purple")))))

Now lets also use our numeric critic values!

upset(movies, set.metadata = list(data = metadata, plots = list(list(type = "heat", column = "Cities", assign = 10, colors = c("Boston" = "green", "NYC" = "navy", "LA" = "purple")), list(type = "heat", column = "avgRottenTomatoesScore", assign = 10))))

As a side note, the way the numerical heat map is handled is similar to how the ordinal heat maps are handled.

Example 3: Set Metadata Boolean Heat Map

Now suppose we have metadata that tells us whether or not these genres are well accepted overseas. This could be used as a categorical column where there are only two categories, but for this example we will assume that your data is coded in 1's and 0's. It is important to keep in mind that if you run a "heat" with 0's and 1's instead of a "bool" the binary data will be treated as numerical values, and a color gradient will be used to show the relative differences.

accepted <- round(runif(17, min = 0, max = 1))
metadata <- cbind(metadata, accepted)
metadata[which(metadata$sets %in% c("Drama", "Comedy", "Action", "Thriller", "Romance")), ]
upset(movies, set.metadata = list(data = metadata, plots = list(list(type="bool", column= "accepted", assign = 5, colors = c("#FF3333", "#006400")))))

Let's see what happens when we choose a "heat" instead of a "bool" for our binary data column.

upset(movies, set.metadata = list(data = metadata, plots = list(list(type="heat", column= "accepted", assign = 5, colors = c("red", "green")))))

Example 4: Set Metadata Text

Lets say we prefer to show text instead of a heat map for the cities these genres were most popular in.

upset(movies, set.metadata = list(data = metadata, plots = list(list(type = "text", column = "Cities", assign = 10, colors = c("Boston" = "green", "NYC" = "navy", "LA" = "purple")))))

Example 5: Applying Metadata to the Matrix

In some cases we may just want to incorporate categorical set metadata directly into the UpSet plot to easily identify characteristics of the sets via the matrix. To do this we need to specify the type as "matrix_rows", what column we're using to categorize the sets, and the colors to apply to each category. There is also an option to change the opacity of the matrix background using alpha. To change the opacity of the matrix background without applying set metadata see the shade.alpha parameter in the upset() function documentation.

upset(movies, set.metadata = list(data = metadata, plots = list(list(type="hist", column="avgRottenTomatoesScore", assign=20),list(type="matrix_rows", column = "Cities", colors = c("Boston" = "green", "NYC" = "navy", "LA" = "purple"), alpha = 0.5))))

Example 6: Multiple Metadata Plots At Once

Now lets sum up all of our metadata information together on one plot!

upset(movies, set.metadata = list(data = metadata, plots = list(list(type="hist", column="avgRottenTomatoesScore", assign=20),list(type="bool", column= "accepted", assign = 5, colors = c("#FF3333", "#006400")), list(type = "text", column = "Cities", assign = 5, colors = c("Boston" = "green", "NYC" = "navy", "LA" = "purple")))))

Example 7: Metadata Plots, Queries, and Attribute Plots

Finally, lets include functionalities discussed in all of the other UpSetR Vignettes! This gives us a very in depth look at information about our sets, intersections, and specific elements.

upset(movies, set.metadata = list(data = metadata, plots = list(list(type="hist", column="avgRottenTomatoesScore", assign=20), list(type="bool", column= "accepted", assign = 5, colors = c("#FF3333", "#006400")), list(type="text", column="Cities", assign=5, colors=c("Boston"="green","NYC"="navy","LA"="purple")), list(type="matrix_rows", column="Cities", colors=c("Boston"="green", "NYC"="navy", "LA"="purple"), alpha=0.5))), queries=list(list(query=intersects, params=list("Drama"), color="red", active=F), list(query=intersects, params=list("Action", "Drama"), active = T), list(query=intersects, params=list("Drama", "Comedy", "Action"), color="orange", active=T)), attribute.plots = list(gridrows=45, plots = list(list(plot=scatter_plot, x="ReleaseDate", y="AvgRating", queries=T), list(plot=scatter_plot, x="AvgRating", y="Watches", queries=F)), ncols=2), query.legend="bottom")


Try the UpSetR package in your browser

Any scripts or data that you put into this service are public.

UpSetR documentation built on May 23, 2019, 1:03 a.m.