Tokenizers: Tokenizers.

TokenizersR Documentation

Tokenizers.

Description

Explicitly create tokenizer objects. Usually you will not call these function, but will instead use one of the use friendly wrappers like readr::read_csv().

Usage

tokenizer_delim(
  delim,
  quote = "\"",
  na = "NA",
  quoted_na = TRUE,
  comment = "",
  trim_ws = TRUE,
  escape_double = TRUE,
  escape_backslash = FALSE,
  skip_empty_rows = TRUE
)

tokenizer_csv(
  na = "NA",
  quoted_na = TRUE,
  quote = "\"",
  comment = "",
  trim_ws = TRUE,
  skip_empty_rows = TRUE
)

tokenizer_tsv(
  na = "NA",
  quoted_na = TRUE,
  quote = "\"",
  comment = "",
  trim_ws = TRUE,
  skip_empty_rows = TRUE
)

tokenizer_line(na = character(), skip_empty_rows = TRUE)

tokenizer_log(trim_ws)

tokenizer_fwf(
  begin,
  end,
  na = "NA",
  comment = "",
  trim_ws = TRUE,
  skip_empty_rows = TRUE
)

tokenizer_ws(na = "NA", comment = "", skip_empty_rows = TRUE)

Arguments

delim

Single character used to separate fields within a record.

quote

Single character used to quote strings.

na

Character vector of strings to interpret as missing values. Set this option to character() to indicate no missing values.

quoted_na

Should missing values inside quotes be treated as missing values (the default) or strings.

comment

A string used to identify comments. Any text after the comment characters will be silently ignored.

trim_ws

Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it?

escape_double

Does the file escape quotes by doubling them? i.e. If this option is TRUE, the value """" represents a single quote, \".

escape_backslash

Does the file use backslashes to escape special characters? This is more general than escape_double as backslashes can be used to escape the delimiter character, the quote character, or to add special characters like \\n.

skip_empty_rows

Should blank rows be ignored altogether? i.e. If this option is TRUE then blank rows will not be represented at all. If it is FALSE then they will be represented by NA values in all the columns.

begin, end

Begin and end offsets for each file. These are C++ offsets so the first column is column zero, and the ranges are [begin, end) (i.e inclusive-exclusive).

Value

A tokeenizer object

Examples

tokenizer_csv()
tokenizer_delim(",")

meltr documentation built on Sept. 11, 2022, 1:07 a.m.