influxr_fread_recover: File fast read and recovery

Description Usage Arguments Details

View source: R/fread-recover.R

Description

influxr_fread_recover.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
influxr_fread_recover(
  file,
  sep = "auto",
  nrows = -1L,
  header = "auto",
  skip_lines = 0L,
  select = NULL,
  read_method = "fast",
  text_preprocess_FUN = NULL,
  fill = TRUE,
  strip_extra_whitespace = FALSE,
  verbose = FALSE,
  ...
)

Arguments

file

File name in working directory, path to file (passed through path.expand for convenience), or a URL starting http://, file://, etc. Compressed files with extension ‘.gz’ and ‘.bz2’ are supported if the R.utils package is installed.

sep

The separator between columns. Defaults to the character in the set [,\t |;:] that separates the sample of rows into the most number of lines with the same number of fields. Use NULL or "" to specify no separator; i.e. each line a single character column like base::readLines does.

nrows

The maximum number of rows to read. Unlike read.table, you do not need to set this to an estimate of the number of rows in the file for better speed because that is already automatically determined by fread almost instantly using the large sample of lines. nrows=0 returns the column names and typed empty columns determined by the large sample; useful for a dry run of a large file or to quickly check format consistency of a set of files before starting to read any of them.

header

Does the first data line contain column names? Defaults according to whether every non-empty field on the first data line is type character. If so, or TRUE is supplied, any empty column names are given a default name.

skip_lines

Number of lines to skip.

select

A vector of column names or numbers to keep, drop the rest. select may specify types too in the same way as colClasses; i.e., a vector of colname=type pairs, or a list of type=col(s) pairs. In all forms of select, the order that the columns are specified determines the order of the columns in the result.

read_method

Either 'fast', 'R' or a function. If 'fast' fread function from package data.table will be used to read files, it has fast performance but doesn't handle corrupted files well, the other option 'R' would use internal R readLines function to read the files and try to strip irregularities before passing the content again to fread for parsing. If a function is provided, file content will be read using the provided function. The function should accept file name as a file argument. And any additional arguments will be passed to the function. It should return file content as text which will parsed using fread. This can be useful when some treatments are necessary for the files before reading e.g. unzip, or format conversion.

fill

logical (default is FALSE). If TRUE then in case the rows have unequal length, blank fields are implicitly filled.

verbose

Be chatty and report timings?

...

Additional arguments passed to fread

logical.

Strip_extra_whitespaces. When TRUE duplicated spaces as well as trailing and leading white spaces will be removed before processing columns. Useful when you have the separator as white space to avoid confusion with the number of columns.

Details

Fill = FALSE will remove any rows that don't have the same number of columns.


influxr/influxr documentation built on Aug. 5, 2020, 9:03 p.m.