ddb_data: Read CSV using DuckDB

View source: R/ddb_ops.R

ddb_dataR Documentation

Read CSV using DuckDB

Description

Lazy-read a CSV file, optionally filter rows, remove duplicates, clean column names, convert character to factor, and collect.

Usage

ddb_data(
  filename,
  datadir = NULL,
  sep = ",",
  header = TRUE,
  quotechar = "",
  ignore_errors = TRUE,
  make_unique = TRUE,
  select_columns = NULL,
  filter_column = NULL,
  filter_vals = NULL,
  character2factor = FALSE,
  collect = TRUE,
  progress = TRUE,
  returnobj = c("data.table", "data.frame"),
  data.table.key = NULL,
  clean_colnames = TRUE,
  verbose = TRUE
)

Arguments

filename

Character: file name; either full path or just the file name, if datadir is also provided

datadir

Character: Optional path if filename is not full path

sep

Character: Field delimiter/separator

header

Logical: If TRUE, first line will be read as column names

quotechar

Character: Quote character

ignore_errors

Logical: If TRUE, ignore parsing errors (sometimes it's either this or no data, so)

make_unique

Logical: If TRUE, keep only unique rows

select_columns

Character vector: Column names to select

filter_column

Character: Name of column to filter on, e.g. "ID"

filter_vals

Numeric or Character vector: Values in filter_column to keep.

character2factor

Logical: If TRUE, convert character columns to factors

collect

Logical: If TRUE, collect data and return structure class as defined by returnobj

progress

Logical: If TRUE, print progress (no indication this works)

returnobj

Character: "data.frame" or "data.table" object class to return. If "data.table", data.frame object returned from DBI::dbGetQuery is passed to data.table::setDT; will add to execution time if very large, but then that's when you need a data.table

data.table.key

Character: If set, this correspond to a column name in the dataset. This column will be set as key in the data.table output

clean_colnames

Logical: If TRUE, clean colnames with clean_colnames

verbose

Logical: If TRUE, print messages to console

Author(s)

E.D. Gennatas

Examples

## Not run: 
ir <- ddb_data("/Data/massive_dataset.csv",
  filter_column = "ID",
  filter_vals = 8001:9999
)

## End(Not run)

egenn/rtemis documentation built on Dec. 17, 2024, 6:16 p.m.