knitr::opts_chunk$set(eval = FALSE) library(chunked)
whisker
, validate
, errorlocate
, docopt
, daff
, tableplot
, ffbase
,...chunked
?Short answer: \begin{center} \includegraphics[width=0.2\textwidth]{img/dplyr_logo} \Huge{for data in text files} \end{center}
\hfill\includegraphics[width=0.1\textwidth]{img/txtfile}
\vspace{-1.6cm}
readr::read_csv
[^1]datatable::fread
data.frame
does not! [^1]:chunked
has inspired readr::read_csv_chunked
, also a nice option!
[^2]: Maybe ALTVEC
in R3.5 changes the game...
sed
awk
grep
It is nice to stay in R
-universe (one data-processing tool)
sed
, awk
and grep
voodoo.\begin{center} \includegraphics[height=0.8\textheight]{img/keep-calm-and-chop-chop-3} \end{center}
dplyr
verbsLaF
.dplyr
verbs on chunk_wise
objects are recorded and replayed when
writing.read_chunkwise("my_data.csv", chunk_size = 5000) %>% select(col1, col2) %>% filter(col1 > 1) %>% mutate(col3 = col1 + 1) %>% write_chunkwise("output.csv")
This code:
db <- src_sqlite('test.db', create=TRUE) tbl <- read_chunkwise("./large_file_in.csv") %>% select(col1, col2, col5) %>% filter(col1 > 10) %>% mutate(col6 = col1 + col2) %>% write_chunkwise(db, 'my_large_table')
tbl<- ( src_sqlite("test.db") %>% tbl("my_table") ) %>% read_chunkwise(chunk_size=5000) %>% select(col1, col2, col5) %>% filter(col1 > 10) %>% mutate(col6 = col1 + col2) %>% write_chunkwise('my_large_table.csv')
filter
, select
, rename
,mutate
,mutate_each
,transmute
,do
,
tbl_vars
, inner_join
, left_join
, semi_join
,anti_join
all work
, also with name completion!summarize
and group_by
work chunkwise (and not for all data!)arrange
, right_join
, full_join
chunkwise
is created.chunkwise
contains:record
and play
head
to quickly scan the input filedplyr::tbl_vars
: powers the Rstudio colname
completion.dplyr
verbs (as mentioned above)dplyr
commands are records and replayed.lazyeval
/ rlang
filter
filter.chunkwise <- function(.data, ..., .dots){ .dots <- lazyeval::all_dots(.dots, ...) cmd <- lazyeval::lazy(filter_(.data, .dots=.dots)) record(.data, cmd) # internal `chunked` function # that stores dplyr expressions }
chunkwise.sample_n
or chunkwise.sample_frac
Create a chunk generic: chunkwise processing is useful for several formats:
ffbase
, feather
arrow
fst
ldat
I don't know the stats, but...
\includegraphics[width=0.5\textwidth]{img/tweet_ben.png} \includegraphics[width=0.5\textwidth]{img/tweethuzzay.png}\ \includegraphics[width=0.5\textwidth]{img/tweet_2.png}
\Large{Interested?}
install.packages("chunked")
\Large{Ideas and suggestions?}
http://github.com/edwindj/chunked
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.