Some of the questions in this practical might not exactly be things you would do in the real world, they are just intended to get you comfortable using some of the stringr functions you have seen so far.
We'll start by loading the necessary packages and data sets
library("tidyverse") data(names, package = "jrTidyverse")
Here we have a data set containing 800 people with the names: "Abigail", "Alexander", "Ava", "Benjamin", "Charlotte", "Emily", "Emma", "Ethan", "Harper", "Isabella", "Jacob", "James", "Liam", "Mason", "Mia", "Michael", "Noah", "Olivia", "Sophia" and "William".
Using various functions from stringr and count()
from dplyr, work out the frequency of each name. Which name occcurs the most?
names %>% mutate(name = str_trim(name)) %>% mutate(name = str_to_title(name)) %>% count(name) %>% arrange(n)
We'll start off by loading the data
data(movies, package = "jrTidyverse")
length(str_subset(movies$title, pattern = "The"))
# not for me! str_subset("movies$title", pattern = "Theo")
title_length
movies = movies %>% mutate(title_length = str_count(title))
summarise()
and max()
movies %>% summarise(max(title_length))
movies %>% filter(title_length == max(title_length)) %>% select(title)
movies %>% ggplot(aes(x = title_length)) + geom_histogram()
movies %>% ggplot(aes(x = title_length, y = rating)) + geom_point()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.