knitr::opts_chunk$set(echo = TRUE, fig.align = "center", eval = FALSE)

How

You can easily use this package {bigreadr} to convert a CSV to an SQLite database without loading the whole CSV in memory.

You can use the following function:

csv2sqlite <- function(csv,
                       every_nlines,
                       table_name,
                       dbname = sub("\\.csv$", ".sqlite", csv),
                       ...) {

  # Prepare reading
  con <- RSQLite::dbConnect(RSQLite::SQLite(), dbname)
  init <- TRUE
  fill_sqlite <- function(df) {

    if (init) {
      RSQLite::dbCreateTable(con, table_name, df)
      init <<- FALSE
    }

    RSQLite::dbAppendTable(con, table_name, df)
    NULL
  }

  # Read and fill by parts
  bigreadr::big_fread1(csv, every_nlines,
                       .transform = fill_sqlite,
                       .combine = unlist,
                       ... = ...)

  # Returns
  con
}

Function bigreadr::big_fread1() first splits the CSV in smaller CSV files, then it reads these CSV files as data frames and transform them, and finally combine the results.

Here, the transformation is just appending the data frame to the SQLite database (and creating this DB the first time). Moreover, you don't want to return anything (NULL).

Use case

For example, with this function, I was able to convert a CSV file of 9 GB in 40 minutes using less than 2 GB of memory.

con <- csv2sqlite(csv, every_nlines = 1e6, table_name = "sirene",
                  encoding = "Latin-1")


privefl/bigreadr documentation built on Jan. 12, 2023, 10 a.m.