knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette has referred to dplyr
's vignette in https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html. We'll try to reproduce all the results. First load the needed packages.
library(tidydt) library(nycflights13) data.table(flights)
filter_dt()
filter_dt(flights, month == 1, day == 1)
arrange_dt()
arrange_dt(flights, year, month, day)
Use -
(minus symbol) to order a column in descending order:
arrange_dt(flights, -arr_delay)
select_dt()
select_dt(flights, year, month, day)
select_dt(flights, year:day)
and select_dt(flights, -(year:day))
are not supported. But I have added a feature to help select with regular expression, which means you can:
select_dt(flights, "^dep")
The rename process is almost the same as that in dplyr
:
select_dt(flights, tail_num = tailnum) rename_dt(flights, tail_num = tailnum)
mutate_dt()
mutate_dt(flights, gain = arr_delay - dep_delay, speed = distance / air_time * 60 )
However, if you just create the column, please split them. The following codes would not work:
mutate_dt(flights, gain = arr_delay - dep_delay, gain_per_hour = gain / (air_time / 60) )
Instead, use:
mutate_dt(flights,gain = arr_delay - dep_delay) %>% mutate_dt(gain_per_hour = gain / (air_time / 60))
If you only want to keep the new variables, use transmute_dt()
:
transmute_dt(flights, gain = arr_delay - dep_delay )
summarise_dt()
summarise_dt(flights, delay = mean(dep_delay, na.rm = TRUE) )
sample_n_dt()
and sample_frac_dt()
sample_n_dt(flights, 10) sample_frac_dt(flights, 0.01)
For the below dplyr
codes:
by_tailnum <- group_by(flights, tailnum) delay <- summarise(by_tailnum, count = n(), dist = mean(distance, na.rm = TRUE), delay = mean(arr_delay, na.rm = TRUE)) delay <- filter(delay, count > 20, dist < 2000)
We could get it via:
flights %>% summarise_dt( count = .N, dist = mean(distance, na.rm = TRUE), delay = mean(arr_delay, na.rm = TRUE),by = tailnum)
summarise_dt
(or summarize_dt
) has a parameter "by", you can specify the group.
We could find the number of planes and the number of flights that go to each possible destination:
# the dplyr syntax: # destinations <- group_by(flights, dest) # summarise(destinations, # planes = n_distinct(tailnum), # flights = n() # ) summarise_dt(flights,planes = uniqueN(tailnum),flights = .N,by = dest) %>% arrange_dt(dest)
If you need to group by many variables, use:
# the dplyr syntax: # daily <- group_by(flights, year, month, day) # (per_day <- summarise(daily, flights = n())) flights %>% summarise_dt(by = .(year,month,day),flights = .N) # (per_month <- summarise(per_day, flights = sum(flights))) flights %>% summarise_dt(by = .(year,month,day),flights = .N) %>% summarise_dt(by = .(year,month),flights = sum(flights)) # (per_year <- summarise(per_month, flights = sum(flights))) flights %>% summarise_dt(by = .(year,month,day),flights = .N) %>% summarise_dt(by = .(year,month),flights = sum(flights)) %>% summarise_dt(by = .(year),flights = sum(flights))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.