read_file_delim: Efficently Read Delimited Files

read_file_delimR Documentation

Efficently Read Delimited Files

Description

read_file_delim() reads delimited files using vroom(). This allows the use of ALTREP columns, which don't load data into memory until they are needed.

Usage

read_file_delim(
  file,
  col_select = NULL,
  col_types = vroom::cols(.default = vroom::col_character()),
  na = c("", ".", "NA", "na", "Na", "N/A", "n/a", "N/a", "NULL", "null", "Null"),
  guess_max = .Machine$integer.max%/%100L,
  delim = NULL,
  ...
)

Arguments

file

path to a local file.

col_select

One or more selection expressions, like in dplyr::select(). Use c() or list() to use more than one expression. See ?dplyr::select for details on available selection options.

col_types

One of NULL, a cols() specification, or a string. See vignette("readr") for more details.

If NULL, all column types will be imputed from the first 1000 rows on the input. This is convenient (and fast), but not robust. If the imputation fails, you'll need to increase the guess_max or supply the correct types yourself.

Column specifications created by list() or cols() must contain one column specification for each column. If you only want to read a subset of the columns, use cols_only().

Alternatively, you can use a compact string representation where each character represents one column:

  • c = character

  • i = integer

  • n = number

  • d = double

  • l = logical

  • f = factor

  • D = date

  • T = date time

  • t = time

  • ? = guess

  • _ or - = skip

    By default, reading a file without a column specification will print a message showing what readr guessed they were. To remove this message, set show_col_types = FALSE or set 'options(readr.show_col_types = FALSE).

na

Character vector of strings to interpret as missing values. Set this option to character() to indicate no missing values.

guess_max

Maximum number of lines to use for guessing column types.

delim

One or more characters used to delimit fields within a file. If NULL the delimiter is guessed from the set of c(",", "\t", " ", "|", ":", ";").

...

Additional arguments to pass to vroom()

Details

By default, read_file_delim() does not attempt to guess column types and reads all columns as character. This can be changed by setting col_types = vroom::cols(.default = vroom::col_guess()). If columns are guessed, the default is to use all rows; this can be changed by setting guess_max to a different value.

This saves a significant amount of time and space when loading data with many rarely used columns.read_file_delim() will eventually be paired with read_file_excel() to replace the internals of read_file().

Value

A tibble if reading one file; a list of tibbles if reading multiple


jesse-smith/coviData documentation built on Jan. 14, 2023, 11:08 a.m.