influxr_fread_recover: File fast read and recovery
In influxr/influxr: An R interface to influxDB time series databases.

View source: R/fread-recover.R

influxr_fread_recover

R Documentation

File fast read and recovery

Description

influxr_fread_recover.

Usage

influxr_fread_recover(
  file,
  sep = "auto",
  nrows = -1L,
  header = "auto",
  skip_lines = 0L,
  select = NULL,
  read_method = "fast",
  text_preprocess_FUN = NULL,
  fill = TRUE,
  strip_extra_whitespace = FALSE,
  verbose = FALSE,
  ...
)

Arguments

`file`	File name in working directory, path to file (passed through `path.expand` for convenience), or a URL starting http://, file://, etc. Compressed files with extension ‘.gz’ and ‘.bz2’ are supported if the `R.utils` package is installed.
`sep`	The separator between columns. Defaults to the character in the set `[,\t \|;:]` that separates the sample of rows into the most number of lines with the same number of fields. Use `NULL` or `""` to specify no separator; i.e. each line a single character column like `base::readLines` does.
`nrows`	The maximum number of rows to read. Unlike `read.table`, you do not need to set this to an estimate of the number of rows in the file for better speed because that is already automatically determined by `fread` almost instantly using the large sample of lines. `nrows=0` returns the column names and typed empty columns determined by the large sample; useful for a dry run of a large file or to quickly check format consistency of a set of files before starting to read any of them.
`header`	Does the first data line contain column names? Defaults according to whether every non-empty field on the first data line is type character. If so, or TRUE is supplied, any empty column names are given a default name.
`skip_lines`	Number of lines to skip.
`select`	A vector of column names or numbers to keep, drop the rest. `select` may specify types too in the same way as `colClasses`; i.e., a vector of `colname=type` pairs, or a `list` of `type=col(s)` pairs. In all forms of `select`, the order that the columns are specified determines the order of the columns in the result.
`read_method`	Either 'fast', 'R' or a function. If 'fast' `fread` function from package data.table will be used to read files, it has fast performance but doesn't handle corrupted files well, the other option 'R' would use internal R `readLines` function to read the files and try to strip irregularities before passing the content again to fread for parsing. If a function is provided, file content will be read using the provided function. The function should accept file name as a `file` argument. And any additional arguments will be passed to the function. It should return file content as text which will parsed using fread. This can be useful when some treatments are necessary for the files before reading e.g. unzip, or format conversion.
`fill`	logical (default is `FALSE`). If `TRUE` then in case the rows have unequal length, blank fields are implicitly filled.
`verbose`	Be chatty and report timings?
`...`	Additional arguments passed to `fread`
`logical.`	Strip_extra_whitespaces. When TRUE duplicated spaces as well as trailing and leading white spaces will be removed before processing columns. Useful when you have the separator as white space to avoid confusion with the number of columns.