Tokenizers: Tokenizers.
In meltr: Read Non-Rectangular Text Data

Tokenizers

R Documentation

Tokenizers.

Description

Explicitly create tokenizer objects. Usually you will not call these function, but will instead use one of the use friendly wrappers like readr::read_csv().

Usage

tokenizer_delim(
  delim,
  quote = "\"",
  na = "NA",
  quoted_na = TRUE,
  comment = "",
  trim_ws = TRUE,
  escape_double = TRUE,
  escape_backslash = FALSE,
  skip_empty_rows = TRUE
)

tokenizer_csv(
  na = "NA",
  quoted_na = TRUE,
  quote = "\"",
  comment = "",
  trim_ws = TRUE,
  skip_empty_rows = TRUE
)

tokenizer_tsv(
  na = "NA",
  quoted_na = TRUE,
  quote = "\"",
  comment = "",
  trim_ws = TRUE,
  skip_empty_rows = TRUE
)

tokenizer_line(na = character(), skip_empty_rows = TRUE)

tokenizer_log(trim_ws)

tokenizer_fwf(
  begin,
  end,
  na = "NA",
  comment = "",
  trim_ws = TRUE,
  skip_empty_rows = TRUE
)

tokenizer_ws(na = "NA", comment = "", skip_empty_rows = TRUE)

Arguments

`delim`	Single character used to separate fields within a record.
`quote`	Single character used to quote strings.
`na`	Character vector of strings to interpret as missing values. Set this option to `character()` to indicate no missing values.
`quoted_na`	Should missing values inside quotes be treated as missing values (the default) or strings.
`comment`	A string used to identify comments. Any text after the comment characters will be silently ignored.
`trim_ws`	Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it?
`escape_double`	Does the file escape quotes by doubling them? i.e. If this option is `TRUE`, the value `⁠""""⁠` represents a single quote, `⁠\"⁠`.
`escape_backslash`	Does the file use backslashes to escape special characters? This is more general than `escape_double` as backslashes can be used to escape the delimiter character, the quote character, or to add special characters like `⁠\\n⁠`.
`skip_empty_rows`	Should blank rows be ignored altogether? i.e. If this option is `TRUE` then blank rows will not be represented at all. If it is `FALSE` then they will be represented by `NA` values in all the columns.
`begin`, `end`	Begin and end offsets for each file. These are C++ offsets so the first column is column zero, and the ranges are [begin, end) (i.e inclusive-exclusive).