We are going to investigate a brief form of the IMDB (movies) data set. Movies were selected for inclusion if they had a known length, had been rated by at least one IMDB user and had an mpaa rating. This gives $4847$ films, where each film has $24$ associated variables. The data can be called and viewed using:

data(bmov, package = "BristolVis")
head(bmov)

Scatter plots (20 minutes)

Let's start with some simple scatter plots using the bmov data:

  1. Plot length Vs. rating.
plot(bmov$Length, bmov$Rating)
  1. Use the xlab and ylab arguments to specify suitable axis labels.
plot(bmov$Length, bmov$Rating, xlab="Length", ylab="Rating")
  1. Use the ylim argument to specify a y-axis range from 1 to 10.
plot(bmov$Length, bmov$Rating, xlab="Length", ylab="Rating", ylim = c(1,10))
  1. Use the col argument to change the colour of the points.
plot(bmov$Length, bmov$Rating, xlab="Length", ylab="Rating",
     ylim = c(1,10), col = 2)
  1. Use the main argument to give the plot a suitable title.
plot(bmov$Length, bmov$Rating, xlab="Length", ylab="Rating",
     ylim = c(1,10), col = 2, main = "Movie rating against length")
  1. If we altered the default plot parameters using:
op = par(mar=c(3,3,2,1), tck=-.01, las=1, cex.axis=0.4)

and generated our plot in (5) again, what will happen? can you figure out what mar, tck, las and cex.axis do?

  1. Reset your plot device using:
par(op)

and generate the last plot again. Do you see the reset effect?

Histograms (20 minutes)

We will now investigate the distribution of movie years using histograms.

  1. Use the hist function to plot a histogram of the movie years.
hist(bmov$Year)
  1. The default method for determining the number of bins isn't great. Use different rules to set breaks to 15.
hist(bmov$Year, breaks = 15)
  1. Use the xlab and ylab arguments to specify suitable axis labels.
hist(bmov$Year, breaks = 15, xlab = "Year", ylab = "Counts")
  1. Use the col argument to change the colour of the histogram.
hist(bmov$Year, breaks = 15, xlab = "Year", ylab = "Counts", col = 3)
  1. Use the main argument to give the plot a suitable title.
hist(bmov$Year, breaks = 15, xlab = "Year", ylab = "Counts", col = 3,
     main = "Histogram for years of the movies")

Boxplots (10 minutes)

  1. Generate a boxplot for the ratings data.
boxplot(bmov$Rating)
  1. Separate the data by whether the movie is a romantic movie.
boxplot(bmov$Rating ~ bmov$Romance)
  1. Try generating a similar boxplot, but for other variables. What happens when you condition on more than one variable? e.g. Romance and Action.
boxplot(bmov$Rating ~ bmov$Romance + bmov$Action)


statcourses/BristolVis documentation built on Jan. 31, 2021, 9:24 p.m.