Load the world_bank
dataset
library("dplyr") data(world_bank, package = "jrBig")
Convert the data frame into a dplyr
data frame
wb = tbl_df(world_bank)
world_bank
and wb
data frames to screen. What's different?glimpse
function do? Hint: just try it on a data frame.wb
, complete the following tasks.filter
AFG
;select
Year.Code
column;Year
and gini
;mutate
Year2010
that is yes
for rows where Year > 2010
arrange
Year
in descending order and gini
summarise
: gdp_percap
column.slice
do? Tryslice(wb, 1:3)
, slice(wb, 5:10)
, slice(wb, n())
n()
do?sample_n
function. Can you sample
$100$ rows from the data set?gini
and gdp_percap
for each country; set na.rm=TRUE
in the mean function. Hint: group by Country.gb = group_by(wb, Year) summarise(gb, mean(gini, na.rm = TRUE), mean(gdp_percap, na.rm = TRUE))
gini
and gdp_percap
for each country per year.gb = group_by(wb, Year, Country.Code) summarise(gb, median(gini, na.rm = TRUE), median(gdp_percap, na.rm = TRUE))
Using the pipe operator, link the following operations together (for the wb
data set)
AFG
;Year.Code
column;Year
in descending order and gini
Year
and gini
;Year2010
that is yes
for rows where Year > 2010
wb = tbl_df(world_bank) wb %>% filter(Country.Code == "AFG") %>% select(-1) %>% mutate(Year2010 = Year > 2010)
Compare
r
wb %>%
group_by(Year, Country.Code) %>%
summarise(gini = median(gini, na.rm = TRUE)) %>%
summarise(max(gini, na.rm = TRUE))
and
r
wb %>%
group_by(Country.Code, Year) %>%
summarise(gini = median(gini, na.rm = TRUE)) %>%
summarise(max(gini, na.rm = TRUE))
* Why are the answers different? What's happening?
r
db = src_sqlite(path = tempfile(), create = TRUE)
wb_sqlite = copy_to(db, world_bank, temporary = FALSE)
wb_sqlite = tbl(db, "world_bank")
src_desc(db) ## Gives you some details
src_tbls(db) ## Lists the tables in the DB
collect
to get the database.Tip: Check out dplyr's CRAN page.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.