Read chunkwise data from text files

Share:

Description

read_csv_chunk will open a connection to a text file. Subsequent dplyr verbs and commands are recorded until collect, write_csv_chunkwise is called. In that case the recorded commands will be executed chunk by chunk. This

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
read_csv_chunkwise(file, chunk_size = 10000L, header = TRUE, sep = ",",
  dec = ".", ...)

read_csv2_chunkwise(file, chunk_size = 10000L, header = TRUE, sep = ";",
  dec = ",", ...)

read_table_chunkwise(file, chunk_size = 10000L, header = TRUE,
  sep = "\t", dec = ".", ...)

read_laf_chunkwise(laf, chunk_size = 10000L)

Arguments

file

path of texst file

chunk_size

size of the chunks te be read

header

Does the csv file have a header with column names?

sep

field separator to be used

dec

decimal separator to be used

...

not used

read_laf_chunkwise reads chunkwise from a LaF object created with laf_open. It offers more control over data specification.

laf

laf object created using LaF

Details

read_csv_chunkwise can be best combined with write_csv_chunkwise or insert_chunkwise_into (see example)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# create csv file for demo purpose
in_file <- file.path(tempdir(), "in.csv")
write.csv(women, in_file, row.names = FALSE, quote = FALSE)

#
women_chunked <-
  read_chunkwise(in_file) %>%  #open chunkwise connection
  mutate(ratio = weight/height) %>%
  filter(ratio > 2) %>%
  select(height, ratio) %>%
  inner_join(data.frame(height=63:66)) # you can join with data.frames!

# no processing done until
out_file <- file.path(tempdir(), "processed.csv")
women_chunked %>%
  write_chunkwise(file=out_file)

head(women_chunked) # works (without processing all data...)

iris_file <- file.path(tempdir(), "iris.csv")
write.csv(iris, iris_file, row.names = FALSE, quote= FALSE)

iris_chunked <-
  read_chunkwise(iris_file, chunk_size = 49) %>% # 49 for demo purpose
  group_by(Species) %>%
  summarise(sepal_length = mean(Sepal.Length), n=n()) # note that mean is per chunk