library(knitr) # opts_knit$set(out.format = "latex") knit_theme$set(knit_theme$get("greyscale0")) # options(replace.assign=FALSE,width=50) # opts_chunk$set(fig.path='figure/graphics-', # cache.path='cache/graphics-', # dev='pdf', fig.width=5, fig.height=5, # cache=FALSE) # knit_hooks$set(crop=hook_pdfcrop) # figure referencing hack fig <- local({ i <- 0 ref <- list() list( cap=function(refName, text) { i <<- i + 1 ref[[refName]] <<- paste0("Figure ",i) paste("Figure ", i, ": ", text, sep="") }, ref=function(refName) { ref[[refName]] }) }) # usage # chunk options fig.cap = fig$cap(<label>, <caption>) # reference `r fig$ref(<label>)`
We will continue to investigate the movies data from earlier. Make sure that you have the data loaded into the session as part of your new script (if you started one).
data(movies, package = "jrIntroductionRSS")
Let's start with a basic scatter plot of the movie ratings against their lengths to generate something like figure 1.
plot(movies$Length, movies$Rating)
Whilst this can be informative at the data exploration stage, it isn't very aesthetically pleasing. First off the default axis labels are not very good.
(1) Use the xlab
and ylab
arguments^[Arguments are the things we pass to our function inside the ()
to control the behavior of that function.] to the plot function to change the axis labels to something more sensible
```r plot(movies$Length, movies$Rating, xlab = "Length", ylab = "Rating") ```
(1) The range of possible ratings is between 0 and 10. Use the ylim
argument to specify a new axis range. The ylim
argument expects a vector of length 2 to be passed to it.^[Check back in the notes for how to create vectors if you need to.]
```r plot(movies$Length, movies$Rating, xlab = "Length", ylab = "Rating", ylim = c(0,10)) ```
(1) I personally don't like the default point type either, pch = 19
is a much nicer choice in my opinion. Change the points in your graph (feel free to experiment with different values of pch
to find one you like.)
```r plot(movies$Length, movies$Rating, xlab = "Length", ylab = "Rating", ylim = c(0,10), pch = 19) ```
(1) Try changing the colours of your points. The argument for this can be a vector (of length 1 or more) of numbers or colour names. You can find out what colours are allowed by name by using the colors()
function. What happens if you specify a col
argument that is more than 1 value.
```r plot(movies$Length, movies$Rating, xlab = "Length", ylab = "Rating", ylim = c(0,10), pch = 19, col = 2) # 2 is red # if you do something like col=1:4 inside plot() # you end up with 4 coloured points. Essentially the # col argument is a vector that recycles throughout # the data if it is shorter than the data ```
(1) We could make this even neater by colouring points by a column in our data. Since the colour argument is a vector then we could use the Comedy column to colour the Comedy films in one colour and the non comedy films in another colour. If we look at our data we will see that the Comedy column consists of 0s and 1
s, there is no colour 0 so we can't pass this directly to col
, but if we add 1 to this vector we could. Give it a try.
```r plot(movies$Length, movies$Rating, xlab = "Length", ylab = "Rating", ylim = c(0,10), pch = 19, col = movies$Comedy + 1) # 2 is red ```
(1) Finally give the graph a title
```r plot(movies$Length, movies$Rating, xlab = "Length", ylab = "Rating", ylim = c(0,10), pch = 19, col = movies$Comedy + 1,# main = "Ratings against Lengths for Comedy films") # 2 is red ```
We should now have a plot that look like Figure 2.
op = par(mfrow = c(1,1), mar = c(3,3,3,1)) plot(movies$Length, movies$Rating, xlab = "Length", ylab = "Rating", ylim = c(0,10), pch = 19, col = movies$Comedy + 1,# main = "Ratings against Lengths for Comedy films") # 2 is red # barplot(tab, beside = TRUE, col = 1:2, # xlab = "MPAA Ratings", ylab = "Frequency", # main = "Comedy films in red") par(op)
Run the folowing R code:
data(USnames,package = "jrIntroductionRSS")
The data frame USnames
is a collection of names given to babies born in the US between 2011 and 2014.
(1) Make sure you are comfortable with what the data looks like using head()
and colnames()
.
```r head(USnames) colnames(USnames) ```
(1) Create a subset of those born in 2012, i.e Year == 2012
called y
.
```r y = USnames[USnames$Year == 2012,] ```
(1) What is the total number of children born in this year?
```r sum(y$Count) ```
(1) Were more male or females children born during the 4 years?
```r females = USnames[USnames$Gender == "F",] sum(females$Count) males = USnames[USnames$Gender == "M",] sum(males$Count) ## more males ```
(1) How many names in 2011 were used less than 10 times?
```r nrow(USnames[USnames$Year == 2011 & USnames$Count < 10,]) ```
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.