In jr-packages/jrTidyverse: Jumping Rivers: Getting to Grips with the Tidyverse

library("dplyr")
library("tidyr")
library("ggplot2")
data(okcupid, package = "jrTidyverse")

tidyr: Getting started with `separate()`

The original state of the okcupid data has numerous messy columns. Let's tidy some of them up. First, the location variable currently stores both the area and the state. We can separate it into two further variables using separate()

okcupid = okcupid %>%
  separate(location, c("area", "state"), sep = ", ")

Notice the warning, R is just telling us that in one of the rows there was two commas and so there were three pieces of information. In this case it was the inclusion of country information on top of the area and state. Next up, ethnicity. In some cases people have listed 3 ethnicities per person. We are only interested in the first one (obviously for actual analysis this is not recommended as it a gross under representation of todays multi ethnic society). Again we can use separate to rid of everything after first ethnicity

okcupid = okcupid %>%
  separate(ethnicity, "ethnicity", sep = ", ")

How many people put poor english as their first language (Hint: use separate() to separate the speaks variable into $3$ different columns then use count())

okcupid = okcupid %>%
  separate(speaks, c("first_lan", "sec_lan", "third_lan"), sep = ", ")
okcupid %>%
  count(first_lan)
# 168

How many people put the programming language c++ as their second language? Hint (apply another separate() operation on the column sec_lan)

okcupid = okcupid %>%
  separate(sec_lan, "second_lan", sep = " ")

okcupid %>%
  count(second_lan) %>%
  filter(second_lan == "c++")

Use separate() to extract a persons religion, given that a persons religion is always stated as the first word in the column "religion".

okcupid %>%
  separate(religion, c("religion"), sep = " ") %>%
  count(religion) %>%
  ggplot(aes(x = religion, y = n)) +
  geom_col() +
  coord_flip()

Tally up the religions in the data set and plot them

okcupid %>%
  separate(religion, c("religion"), sep = " ") %>%
  count(religion) %>%
  ggplot(aes(x = religion, y = n)) +
  geom_col() +
  coord_flip()

Using `pivot_wider()`

Long format data is great for us as R programmers as it is a convenient format for lots of things that we wish to do with our data. However it is not always the most useful way to share a table with others. An example of this might be as follows.

The code

(df = okcupid %>%
   group_by(sex) %>%
   count(orientation)
 )

creates a data structure that works well for ggplot but is more difficult to read in a report.

Use pivot_wider() to give a column for each orientation, filled with the counts. Does it look a bit nicer?

df %>%
  pivot_wider(names_from = orientation, values_from = n)

jr-packages/jrTidyverse documentation built on Oct. 11, 2020, 9:03 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jr-packages/jrTidyverse
Jumping Rivers: Getting to Grips with the Tidyverse

In jr-packages/jrTidyverse: Jumping Rivers: Getting to Grips with the Tidyverse

tidyr: Getting started with `separate()`

Using `pivot_wider()`

R Package Documentation

Browse R Packages

We want your feedback!

jr-packages/jrTidyverse Jumping Rivers: Getting to Grips with the Tidyverse

In jr-packages/jrTidyverse: Jumping Rivers: Getting to Grips with the Tidyverse

tidyr: Getting started with separate()

Using pivot_wider()

R Package Documentation

Browse R Packages

We want your feedback!

jr-packages/jrTidyverse
Jumping Rivers: Getting to Grips with the Tidyverse

tidyr: Getting started with `separate()`

Using `pivot_wider()`