# load packages ----------------------------------------------------------------

library(learnr)
library(gradethis)
library(tidyverse)
library(dsbox)

# set options for exercises and checking ---------------------------------------

gradethis_setup()

# hide non-exercise code chunks ------------------------------------------------

knitr::opts_chunk$set(echo = FALSE)

Introduction

#might replace the image!
knitr::include_graphics("images/traffic.jpg")

In this tutorial, we will look at traffic accidents in Edinburgh. The data are made available online by the UK Government. It covers all recorded accidents in Edinburgh in 2018 and some of the variables were modified for the purposes of this tutorial.

Learning goals

Packages

We'll use the tidyverse package for the analysis and the dsbox package for the data. These packages are already installed for you, so you load it as usual by running the following code:

library(tidyverse)
library(dsbox)
library(tidyverse)
library(dsbox)
grade_this_code("The tidyverse and dsbox packages are now loaded!")

Data

The data is in the dsbox package and it's called accidents.

Below is an excerpt of the data dictionary. Note that it is long (there are lots of variables in the data), but we will be using a limited set of the variables for our analysis.

| Header | Description |:----------------|:-------------------------------- | id | Accident ID | easting | Easting of accident location | northing | Northing of accident location | longitude | Longitude of accident location | latitude | Latitude of accident location | police_force | Police force | severity | Accident severity: Fatal, Serious, Slight | vehicles | Number of vehicles involved in accident | casualties | Number of people injured in the accident | date | Date of the accident | day_of_week | Day of the week of the accident | time | Time of the accident on the 24h clock | district | Local authority district | highway | Local authority highway | first_road_class | Class of 1st road involved in accident: Motorway, A(M) road (A-road with motorway restrictions), A-road, B-road, C-road, Unclassified | first_road_number | ID of 1st road (0 if unclassified) | road_type | Type of road: Roundabout, One way street, Dual carriageway, Single carriageway, Slip road | speed_limit | Speed limit on the road in mph | junction_detail | Detail on junction where accident occurred: Crossroads, Mini-roundabout, More than 4 arms, Not within 20 metres of junction, Other junction, Private drive or entrance, Roundabout, Slip road, T or staggered junction | junction_control | How junction was controlled: Authorised person, Auto traffic signal, Give way or uncontrolled, Missing / Out of range, Stop sign | second_road_class | Class of 2st road involved in accident: A-road, B-road, C-road, Missing / Out of range, Motorway, Unclassified | second_road_number | ID of 2nd road (0 if unclassified) | ped_cross_human | Level of human control at a pedestrian crossing: Control by other authorised person, Control by school crossing patrol, None within 50 metres | ped_cross_physical | Level of facilities controlling a pedestrian crossing: Central refuge, No physical crossing facilities within 50 metres, Non-junction crossing (pelican, puffin, toucan or similar light crossing), Pedestrian phase at traffic signal junction, Zebra crossing | light | Light condition at the time of accident: Daylight, Darkness - lights lit, Darkness - lights unlit, Darkness - no lighting, Darkness - lighting unknown | weather | Weather condition at the time of accident: Fine + no high winds, Raining + no high winds, Snowing + no high winds, Fine + high winds, Raining + high winds, Snowing + high winds, Fog or mist, Other, Unknown | road_surface | Road surface conditions at the time of the accident: Dry, Wet or damp, Snow, Frost or ice, Flood over 3cm deep | special_condition | Special condition at the site of the accident: None, Road sign or marking defective or obscured, Roadworks, Road surface defective | hazard | Carriageway hazards: None, Other object on road, Previous accident, Pedestrian in carriageway - not injured | urban_rural | Type of area the accident occurred in: 1 - urban, 2 - rural | police | Did police officer attend the scene of the accident: No, No + accident self reported (using a self completion form), Yes

First look at the data

You can take a peek at the data using the glimpse() function in the box below.

glimpse(accidents)
question("What does each row in the dataset represent?",
    answer("The registration number of a car"),
    answer("The location of an accident"),
    answer("A recorded accident",
           correct = TRUE,
           message = "Each row in the dataset contains all information relating to an individual recorded accident in Edinburgh."),
    answer("An insurance claim "),
    allow_retry = TRUE
  )

How many accidents were recorded in Edinburgh in 2018? Use the following code chunk to submit your answer.


Each row represents one recorded accident!
Try using nrow()!
grade_this({
  if(identical(.result, 768) | identical(.result, 768L)) {
    pass("There are 768 rows, therefore, 768 accidents were recorded in Edinburgh in 2018.")
    }
  if(identical(.result, 31) | identical(.result, 31L)) {
    fail("Each observation is represented in one row. Did you calculate the number of columns instead of rows?")
    }
  fail("Not quite. Each observation is represented in one row. Try looking at the hints for some help!")
})

How many variables are recorded on these crashes? Use the code chunk below!


Each variable is displayed as a column.
Try using ncol()!
grade_this({
  if(identical(.result, 31) | identical(.result, 31L)) {
    pass("Since there are 31 columns in the dataset, we know that 31 variables are recorded.")
  }
  if(identical(.result, 768) | identical(.result, 768L)) {
    fail("Each variable is recorded in a column. Did you maybe use the number of rows?")
  }
  fail("Not quite. Each variable is represented in a column. Try looking at the hints for some help!")
})

Multi-vehicle accidents

How many accidents with 2 or more vehicles occurred in an urban area? Use the code chunk below to find out!

``` {r filter-accidents, exercise = TRUE} |> (, ) |> nrow()

```r
Use filter() to find the rows that match the criteria.
Review the data dictionary, specifically the variables urban_rural and vehicles.

``` {r filter-accidents-hint-3} accidents |> filter(vehicles >= , urbanrural == _) |> nrow()

```r
grade_this({
  if(identical(.result, 407) | identical(.result, 407L)) {
    pass("There are 407 rows that correspond to accidents with 2 or more vehicles occurred in an urban area.")
  } if(identical(.result, 72) | identical(.result, 72L)) {
    pass("Check which level corresponds to which type.")
  }
  fail("Take a peek at the hints!")
})

Speed limits

Create a frequency table of the speed limits at which accidents happen (speed_limit). Look at the hints for help!

___ |>
  ___(___)
See the help for the `count()` function, specifically the 
`sort` argument for reporting the frequency table in descending order of counts, 
i.e. highest on top.
accidents |>
  ___(___, sort = TRUE)
accidents |>
  count(speed_limit, sort = TRUE)
grade_this({
  if(identical(.result$n[1], 379L)) {
    pass("You have created the correct frequency table!")}
  fail("Not quite. See the hints for help!")
 })
question("What is the most common speed limit in the
dataset?",
    answer("20", correct = TRUE),
    answer("30"),
    answer("40"),
    answer("50"),
    answer("60"),
    answer("70"),
    allow_retry = TRUE
  )

Accident severity

Visualising

Recreate the following plot. To match the colours, you can use scale_fill_viridis_d().

ggplot(data = accidents, aes(x = severity, fill = light)) +
  geom_bar(position = "fill") +
  coord_flip() +
  labs(y = "Proportion", x = "Accident severity",
       fill = "Light condition", 
       title = "Light condition and accident severity") +
  scale_fill_viridis_d()
ggplot(data = ___, aes(x = ___, ___ = ___)) +
  geom____(___) +
  ___() +
  ___(y = ___, x = ___,
       ___ = ___, 
       title = ___)
ggplot(data = ___, aes(x = ___, ___ = ___)) +
  geom____(___) +
  ___() +
  ___(y = ___, x = ___,
       ___ = ___, 
       title = ___) +
  scale_fill_viridis_d()
ggplot(data = ___, aes(x = ___, fill = ___)) +
  geom_bar(___) +
  coord_flip() +
  labs(y = ___, x = ___,
       fill = ___, 
       title = ___) +
  scale_fill_viridis_d()
ggplot(data = ___, aes(x = ___, fill = ___)) +
  geom_bar(positions = ___) +
  coord_flip() +
  labs(y = ___, x = ___,
       fill = "Light condition", 
       title = ___) +
  scale_fill_viridis_d()

``` {r plot-light-solution} ggplot(data = accidents, aes(x = severity, fill = light)) + geom_bar(position = "fill") + coord_flip() + labs(y = "Proportion", x = "Accident severity", fill = "Light condition", title = "Light condition and accident severity") + scale_fill_viridis_d()

```r
grade_this_code("Well done!")
question("Which of the following are true? Check all that apply.", 
         answer("Most accidents occur in daylight",
                correct = TRUE),
         answer("Roughly 20 percent of serious accidents occurred in the darkness without lighting",
                message = "Look closely at the legend and the colours of the bars!"),
         answer("Crashes in the darkness tend to be more severe",
                correct = TRUE),
         answer("Fatal crashes have the highest proportion of crashes in the darkness where the lights are lit",
                message = "Compare the sizes of the segments for 'Darkness - lights lit' across the bars."),
         answer("Most slight accidents in the darkness happen without lighting."),
        allow_retry = TRUE
         )

Customising labels

Recreate the same figure, but this time change the labels of the crash severity variable such that the dashes in the labels don't show up. There are many ways to do this, but in this tutorial, we'll focus on changing how the data are represented in the light variable using mutate(). Note that the colours in the figure might change, but that's ok.

accidents<- ___ |>
  ___(___)

#now, copy the code from the previous exercise here!

``` {r change-labels-hint-1} You could try using case_when()

```r
accidents <- accidents |>
  mutate(___ = case_when(___ == ___ ~ ___,
                            ...))
accidents <- accidents |>
  mutate(light = case_when(
    light == "Daylight" ~ "Daylight",
    light == "Darkness - lights lit" ~ "Darkness, lights lit",
    ___
    ))
accidents <- accidents |>
  mutate(light = case_when(
    light == "Daylight" ~ "Daylight",
    light == "Darkness - lights lit"       ~ "Darkness, lights lit",
    light == "Darkness - lights unlit"     ~ "Darkness, lights unlit", 
    light == "Darkness - no lighting"      ~ "Darkness, no lighting", 
    light == "Darkness - lighting unknown" ~ "Darkness, lighting unknown"
    ))

ggplot(data = accidents, aes(x = severity, fill = light)) +
  geom_bar(position = "fill") +
  coord_flip() +
  labs(y = "Proportion", x = "Accident severity",
       fill = "Light condition", 
       title = "Light condition and accident severity") +
  scale_fill_viridis_d()
grade_this_code("Your solution is correct!")

Wrap up

You have finished tutorial two, good job! We hope you enjoyed this lesson on data visualisation.



rstudio-education/dsbox documentation built on Oct. 22, 2023, 12:20 a.m.