knitr::opts_chunk$set(eval = FALSE)
library(chunked)

Who am I?

What is chunked?

Short answer: \begin{center} \includegraphics[width=0.2\textwidth]{img/dplyr_logo} \Huge{for data in text files} \end{center}

Process Data in GB-sized Text Files:

(pre)Process text files to:

\hfill\includegraphics[width=0.1\textwidth]{img/txtfile}

\vspace{-1.6cm}

Save result into:

Option 1: Read data with R {.plain}

Use:

However...

[^1]:chunked has inspired readr::read_csv_chunked, also a nice option! [^2]: Maybe ALTVEC in R3.5 changes the game...

Option 2: Use unix tools

Good choice!

However...

It is nice to stay in R-universe (one data-processing tool)

Option 3: Import data in DB

Import data into DB

However

Process in chunks?

\begin{center} \includegraphics[height=0.8\textheight]{img/keep-calm-and-chop-chop-3} \end{center}

Option 4: Use chunked!

Idea:

Scenario 1: TXT -> TXT

Preprocess a text file with data

read_chunkwise("my_data.csv", chunk_size = 5000) %>% 
 select(col1, col2) %>% 
 filter(col1 > 1) %>% 
 mutate(col3 = col1 + 1) %>% 
write_chunkwise("output.csv")

This code:

Scenario 2: TXT -> DB

Insert processed text data in DB

db <- src_sqlite('test.db', create=TRUE)

tbl <- 
  read_chunkwise("./large_file_in.csv") %>% 
  select(col1, col2, col5) %>%
  filter(col1 > 10) %>% 
  mutate(col6 = col1 + col2) %>% 
  write_chunkwise(db, 'my_large_table')

Scenario 3: DB -> TXT

Extract a large table from a DB to a text file

tbl<- 
  ( src_sqlite("test.db") %>% 
    tbl("my_table")
  ) %>% 
  read_chunkwise(chunk_size=5000) %>% 
  select(col1, col2, col5) %>%
  filter(col1 > 10) %>% 
  mutate(col6 = col1 + col2) %>% 
  write_chunkwise('my_large_table.csv')

Caveat

Working:

However:

Implementation

chunkwise contains:

methods implemented

e.g. filter

filter.chunkwise <- function(.data, ..., .dots){
  .dots <- lazyeval::all_dots(.dots, ...)
  cmd <- lazyeval::lazy(filter_(.data, .dots=.dots))
  record(.data, cmd)   # internal `chunked` function
                       # that stores dplyr expressions
}

Ideas

Usage?

I don't know the stats, but...

\includegraphics[width=0.5\textwidth]{img/tweet_ben.png} \includegraphics[width=0.5\textwidth]{img/tweethuzzay.png}\ \includegraphics[width=0.5\textwidth]{img/tweet_2.png}

Thank you!

\Large{Interested?}

install.packages("chunked")

\Large{Ideas and suggestions?}

http://github.com/edwindj/chunked



edwindj/chunked documentation built on March 25, 2022, 8:03 a.m.