df_from_file: Read Parquet, CSV, and other files using DuckDB
In duckdblabs/duckplyr: A 'DuckDB'-Backed Version of 'dplyr'

df_from_file

R Documentation

Read Parquet, CSV, and other files using DuckDB

Description

df_from_file() uses arbitrary table functions to read data. See https://duckdb.org/docs/data/overview for a documentation of the available functions and their options. To read multiple files with the same schema, pass a wildcard or a character vector to the path argument,

duckplyr_df_from_file() is a thin wrapper around df_from_file() that calls as_duckplyr_df() on the output.

These functions ingest data from a file using a table function. The results are transparently converted to a data frame, but the data is only read when the resulting data frame is actually accessed.

df_from_csv() reads a CSV file using the read_csv_auto() table function.

duckplyr_df_from_csv() is a thin wrapper around df_from_csv() that calls as_duckplyr_df() on the output.

df_from_parquet() reads a Parquet file using the read_parquet() table function.

duckplyr_df_from_parquet() is a thin wrapper around df_from_parquet() that calls as_duckplyr_df() on the output.

df_to_parquet() writes a data frame to a Parquet file via DuckDB. If the data frame is a duckplyr_df, the materialization occurs outside of R. An existing file will be overwritten. This function requires duckdb >= 0.10.0.

Usage

df_from_file(path, table_function, ..., options = list(), class = NULL)

duckplyr_df_from_file(
  path,
  table_function,
  ...,
  options = list(),
  class = NULL
)

df_from_csv(path, ..., options = list(), class = NULL)

duckplyr_df_from_csv(path, ..., options = list(), class = NULL)

df_from_parquet(path, ..., options = list(), class = NULL)

duckplyr_df_from_parquet(path, ..., options = list(), class = NULL)

df_to_parquet(data, path)

Arguments

`path`	Path to files, glob patterns `*` and `⁠?⁠` are supported.
`table_function`	The name of a table-valued DuckDB function such as `"read_parquet"`, `"read_csv"`, `"read_csv_auto"` or `"read_json"`.
`...`	These dots are for future extensions and must be empty.
`options`	Arguments to the DuckDB function indicated by `table_function`.
`class`	The class of the output. By default, a tibble is created. The returned object will always be a data frame. Use `class = "data.frame"` or `class = character()` to create a plain data frame.
`data`	A data frame to be written to disk.

Value

A data frame for df_from_file(), or a duckplyr_df for duckplyr_df_from_file(), extended by the provided class.

Examples

# Create simple CSV file
path <- tempfile("duckplyr_test_", fileext = ".csv")
write.csv(data.frame(a = 1:3, b = letters[4:6]), path, row.names = FALSE)

# Reading is immediate
df <- df_from_csv(path)

# Materialization only upon access
names(df)
df$a

# Return as tibble, specify column types:
df_from_file(
  path,
  "read_csv",
  options = list(delim = ",", types = list(c("DOUBLE", "VARCHAR"))),
  class = class(tibble())
)

# Read multiple file at once
path2 <- tempfile("duckplyr_test_", fileext = ".csv")
write.csv(data.frame(a = 4:6, b = letters[7:9]), path2, row.names = FALSE)

duckplyr_df_from_csv(file.path(tempdir(), "duckplyr_test_*.csv"))

unlink(c(path, path2))

# Write a Parquet file:
path_parquet <- tempfile(fileext = ".parquet")
df_to_parquet(df, path_parquet)

# With a duckplyr_df, the materialization occurs outside of R:
df %>%
  as_duckplyr_df() %>%
  mutate(b = a + 1) %>%
  df_to_parquet(path_parquet)

duckplyr_df_from_parquet(path_parquet)

unlink(path_parquet)

duckdblabs/duckplyr documentation built on June 13, 2025, 1:59 p.m.