library(knitr)
# opts_knit$set(out.format = "latex")
knit_theme$set(knit_theme$get("greyscale0"))

# options(replace.assign=FALSE,width=50)

# opts_chunk$set(fig.path='figure/graphics-', 
#                cache.path='cache/graphics-', 
#                dev='pdf', fig.width=5, fig.height=5, 
#               cache=FALSE)
# knit_hooks$set(crop=hook_pdfcrop)

# figure referencing hack
fig <- local({
    i <- 0
    ref <- list()
    list(
        cap=function(refName, text) {
            i <<- i + 1
            ref[[refName]] <<- paste0("Figure ",i)
            paste("Figure ", i, ": ", text, sep="")
        },
        ref=function(refName) {
            ref[[refName]]
        })
})

# usage 
# chunk options fig.cap = fig$cap(<label>, <caption>)
# reference `r fig$ref(<label>)`

Base Graphics

We will continue to investigate the movies data from earlier. Make sure that you have the data loaded into the session as part of your new script (if you started one).

data(movies, package = "jrIntroductionRSS")

Scatter plots

Let's start with a basic scatter plot of the movie ratings against their lengths to generate something like figure 1.

plot(movies$Length, movies$Rating)

Whilst this can be informative at the data exploration stage, it isn't very aesthetically pleasing. First off the default axis labels are not very good.

(1) Use the xlab and ylab arguments^[Arguments are the things we pass to our function inside the () to control the behavior of that function.] to the plot function to change the axis labels to something more sensible

```r
plot(movies$Length, movies$Rating,
     xlab = "Length", ylab = "Rating")
```

(1) The range of possible ratings is between 0 and 10. Use the ylim argument to specify a new axis range. The ylim argument expects a vector of length 2 to be passed to it.^[Check back in the notes for how to create vectors if you need to.]

```r
plot(movies$Length, movies$Rating,
     xlab = "Length", ylab = "Rating",
     ylim = c(0,10))
```

(1) I personally don't like the default point type either, pch = 19 is a much nicer choice in my opinion. Change the points in your graph (feel free to experiment with different values of pch to find one you like.)

```r
plot(movies$Length, movies$Rating,
     xlab = "Length", ylab = "Rating",
     ylim = c(0,10), pch = 19)
```

(1) Try changing the colours of your points. The argument for this can be a vector (of length 1 or more) of numbers or colour names. You can find out what colours are allowed by name by using the colors() function. What happens if you specify a col argument that is more than 1 value.

```r
plot(movies$Length, movies$Rating,
     xlab = "Length", ylab = "Rating",
     ylim = c(0,10), pch = 19,
     col = 2) # 2 is red

# if you do something like col=1:4 inside plot()
# you end up with 4 coloured points. Essentially the
# col argument is a vector that recycles throughout
# the data if it is shorter than the data
```

(1) We could make this even neater by colouring points by a column in our data. Since the colour argument is a vector then we could use the Comedy column to colour the Comedy films in one colour and the non comedy films in another colour. If we look at our data we will see that the Comedy column consists of 0s and 1s, there is no colour 0 so we can't pass this directly to col, but if we add 1 to this vector we could. Give it a try.

```r
plot(movies$Length, movies$Rating,
     xlab = "Length", ylab = "Rating",
     ylim = c(0,10), pch = 19,
     col = movies$Comedy + 1) # 2 is red
```

(1) Finally give the graph a title

```r
plot(movies$Length, movies$Rating,
     xlab = "Length", ylab = "Rating",
     ylim = c(0,10), pch = 19,
     col = movies$Comedy + 1,#
     main = "Ratings against Lengths for Comedy films") # 2 is red
```

We should now have a plot that look like Figure 2.

op = par(mfrow = c(1,1), mar = c(3,3,3,1))
 plot(movies$Length, movies$Rating,
         xlab = "Length", ylab = "Rating",
         ylim = c(0,10), pch = 19,
         col = movies$Comedy + 1,#
         main = "Ratings against Lengths for Comedy films") # 2 is red
# barplot(tab, beside = TRUE, col = 1:2,
#             xlab = "MPAA Ratings", ylab = "Frequency",
#             main = "Comedy films in red")
par(op)

Question 2

Run the folowing R code:

data(USnames,package = "jrIntroductionRSS")

The data frame USnames is a collection of names given to babies born in the US between 2011 and 2014.

(1) Make sure you are comfortable with what the data looks like using head() and colnames().

```r
head(USnames)
colnames(USnames)
```

(1) Create a subset of those born in 2012, i.e Year == 2012 called y.

```r
y = USnames[USnames$Year == 2012,]
```

(1) What is the total number of children born in this year?

```r
sum(y$Count)
```

(1) Were more male or females children born during the 4 years?

```r
females = USnames[USnames$Gender == "F",]
sum(females$Count)
males = USnames[USnames$Gender == "M",]
sum(males$Count)
## more males
```

(1) How many names in 2011 were used less than 10 times?

```r
nrow(USnames[USnames$Year == 2011 & USnames$Count < 10,])
```


jr-packages/jrIntroductionRSS documentation built on Nov. 4, 2019, 3:23 p.m.