We are going to investigate a brief form of the IMDB (movies) data set. Movies were selected for inclusion if they had a known length, had been rated by at least one IMDB user and had an mpaa rating. This gives $4847$ films, where each film has $24$ associated variables. The data can be called and viewed using:

data(bmov, package = "BristolVis")
head(bmov)

Scatter plots (20 minutes)

Let's start with some simple scatter plots using the bmov data:

  1. Plot length Vs. rating.
plot(bmov$Length, bmov$Rating)
  1. Use the xlab and ylab arguments to specify suitable axis labels.
plot(bmov$Length, bmov$Rating, xlab="Length", ylab="Rating")
  1. Use the ylim argument to specify a y-axis range from 1 to 10.
plot(bmov$Length, bmov$Rating, xlab="Length", ylab="Rating", ylim = c(1,10))
  1. Use the col argument to change the colour of the points.
plot(bmov$Length, bmov$Rating, xlab="Length", ylab="Rating",
     ylim = c(1,10), col = 2)
  1. Use the main argument to give the plot a suitable title.
plot(bmov$Length, bmov$Rating, xlab="Length", ylab="Rating",
     ylim = c(1,10), col = 2, main = "Movie rating against length")
  1. If we altered the default plot parameters and generated our plot in (5) again, what will happen? can you figure out what mar, tck, las and cex.axis do?
op = par(mar=c(3,3,2,1), tck=-.01, las=1, cex.axis=0.4)
plot(bmov$Length, bmov$Rating, xlab="Length", ylab="Rating",
     ylim = c(1,10), col = 2, main = "Movie rating against length")
plot(bmov$Length, bmov$Rating, xlab="Length", ylab="Rating",
     ylim = c(1,10), col = 2, main = "Movie rating against length", tck=-.01, las=1, cex.axis=0.4)

mar: control plot margins (bottom, left, top, right) tck: control length of the tick marks as a fraction of the smaller of the width or height of the plotting region. cex.axis: The magnification to be used for axis annotation relative to the current setting of cex.

  1. Reset your plot device using:
par(op)

and generate the last plot again. Do you see the reset effect?

plot(bmov$Length, bmov$Rating, xlab="Length", ylab="Rating",
     ylim = c(1,10), col = 2, main = "Movie rating against length")

Histograms (20 minutes)

We will now investigate the distribution of movie years using histograms.

  1. Use the hist function to plot a histogram of the movie years.
hist(bmov$Year)
  1. The default method for determining the number of bins isn't great. Use different rules to set breaks to 15.
hist(bmov$Year, breaks = 15)
  1. Use the xlab and ylab arguments to specify suitable axis labels.
hist(bmov$Year, breaks = 15, xlab = "Year", ylab = "Counts")
  1. Use the col argument to change the colour of the histogram.
hist(bmov$Year, breaks = 15, xlab = "Year", ylab = "Counts", col = 3)
  1. Use the main argument to give the plot a suitable title.
hist(bmov$Year, breaks = 15, xlab = "Year", ylab = "Counts", col = 3,
     main = "Histogram for years of the movies")

Boxplots (10 minutes)

  1. Generate a boxplot for the ratings data.
boxplot(bmov$Rating)
  1. Separate the data by whether the movie is a romantic movie.
boxplot(bmov$Rating ~ bmov$Romance)
  1. Try generating a similar boxplot, but for other variables. What happens when you condition on more than one variable? e.g. Romance and Action.
boxplot(bmov$Rating ~ bmov$Romance + bmov$Action)

We can also change axis labels by:

# Plot a boxplot but skip the labels
boxplot(bmov$Rating ~ bmov$Romance + bmov$Action, axes=FALSE, frame.plot=TRUE, ylim=c(0,10))

# Y-axis: 0 to 10 in steps of 2.5
axis(2, at=seq(0,10,2.5))

## X-Axis, at points x=1 : x=4, 
## but with sensible labels
axis(1,at=1:4, labels=c("Non-R Non-A", "Romantic", "Action", "R and A"))


statcourses/BristolVis documentation built on Jan. 31, 2021, 9:24 p.m.