ddb_data: Read CSV using DuckDB
In egenn/rtemis: Machine Learning and Visualization

ddb_data

R Documentation

Read CSV using DuckDB

Description

Lazy-read a CSV file, optionally filter rows, remove duplicates, clean column names, convert character to factor, and collect.

Usage

ddb_data(
  filename,
  datadir = NULL,
  sep = ",",
  header = TRUE,
  quotechar = "",
  ignore_errors = TRUE,
  make_unique = TRUE,
  select_columns = NULL,
  filter_column = NULL,
  filter_vals = NULL,
  character2factor = FALSE,
  collect = TRUE,
  progress = TRUE,
  returnobj = c("data.table", "data.frame"),
  data.table.key = NULL,
  clean_colnames = TRUE,
  verbosity = 1L
)

Arguments

`filename`	Character: file name; either full path or just the file name, if `datadir` is also provided.
`datadir`	Character: Optional path if `filename` is not full path.
`sep`	Character: Field delimiter/separator.
`header`	Logical: If TRUE, first line will be read as column names.
`quotechar`	Character: Quote character.
`ignore_errors`	Logical: If TRUE, ignore parsing errors (sometimes it's either this or no data, so).
`make_unique`	Logical: If TRUE, keep only unique rows.
`select_columns`	Character vector: Column names to select.
`filter_column`	Character: Name of column to filter on, e.g. "ID".
`filter_vals`	Numeric or Character vector: Values in `filter_column` to keep. `filter_column` to keep.
`character2factor`	Logical: If TRUE, convert character columns to factors.
`collect`	Logical: If TRUE, collect data and return structure class as defined by `returnobj`.
`progress`	Logical: If TRUE, print progress (no indication this works).
`returnobj`	Character: "data.frame" or "data.table" object class to return. If "data.table", data.frame object returned from `DBI::dbGetQuery` is passed to `data.table::setDT`; will add to execution time if very large, but then that's when you need a data.table.
`data.table.key`	Character: If set, this corresponds to a column name in the dataset. This column will be set as key in the data.table output.
`clean_colnames`	Logical: If TRUE, clean colnames with clean_colnames.
`verbosity`	Integer: Verbosity level.

Value

data.frame or data.table.

Author(s)

EDG

Examples

## Not run: 
ir <- ddb_data("/Data/massive_dataset.csv",
  filter_column = "ID",
  filter_vals = 8001:9999
)

## End(Not run)

egenn/rtemis documentation built on June 14, 2025, 11:54 p.m.

egenn/rtemis index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

egenn/rtemis
Machine Learning and Visualization

ddb_data: Read CSV using DuckDB
In egenn/rtemis: Machine Learning and Visualization

Read CSV using DuckDB

Description

Usage

Arguments

Value

Author(s)

Examples

Related to ddb_data in egenn/rtemis...

R Package Documentation

Browse R Packages

We want your feedback!

egenn/rtemis Machine Learning and Visualization

ddb_data: Read CSV using DuckDB In egenn/rtemis: Machine Learning and Visualization

Read CSV using DuckDB

Description

Usage

Arguments

Value

Author(s)

Examples

Related to ddb_data in egenn/rtemis...

R Package Documentation

Browse R Packages

We want your feedback!

egenn/rtemis
Machine Learning and Visualization

ddb_data: Read CSV using DuckDB
In egenn/rtemis: Machine Learning and Visualization