knitr::opts_chunk$set(eval = FALSE)
library(chunked)

What is chunked?

Short answer: \begin{center} \includegraphics[width=0.2\textwidth]{img/dplyr_logo} \Huge{for data in text files} \end{center}

Process Data in GB-sized Text Files:

(pre)Process text files to:

\hfill\includegraphics[width=0.1\textwidth]{img/txtfile}

\vspace{-1.6cm}

Save result into:

Option 1: Read data with R

Use:

However...

Option 2: Use unix tools

Good choice!

However...

It is nice to stay in R-universe (one data-processing tool)

Option 3: Import data in DB

Import data into DB

However

Process in chunks?

\begin{center} \includegraphics[height=0.8\textheight]{img/keep-calm-and-chop-chop-3} \end{center}

Option 4: Use chunked!

Idea:

Scenario 1: TXT -> TXT

Preprocess a text file with data

read_chunkwise("my_data.csv", chunk_size = 5000) %>% 
 select(col1, col2) %>% 
 filter(col1 > 1) %>% 
 mutate(col3 = col1 + 1) %>% 
write_chunkwise("output.csv")

This code:

Scenario 2: TXT -> DB

Insert processed text data in DB

db <- src_sqlite('test.db', create=TRUE)

tbl <- 
  read_chunkwise("./large_file_in.csv") %>% 
  select(col1, col2, col5) %>%
  filter(col1 > 10) %>% 
  mutate(col6 = col1 + col2) %>% 
  write_chunkwise(db, 'my_large_table')

Scenario 3: DB -> TXT

Extract a large table from a DB to a text file

tbl<- 
  ( src_sqlite("test.db") %>% 
    tbl("my_table")
  ) %>% 
  read_chunkwise(chunk_size=5000) %>% 
  select(col1, col2, col5) %>%
  filter(col1 > 10) %>% 
  mutate(col6 = col1 + col2) %>% 
  write_chunkwise('my_large_table.csv')

Caveat

Working:

However:

Thank you!

\Large{Interested?}

install.packages("chunked")

Or visit http://github.com/edwindj/chunked



edwindj/chunked documentation built on March 25, 2022, 8:03 a.m.