laf_open_csv: Create a connection to a comma separated value (CSV) file.
In LaF: Fast Access to Large ASCII Files

laf_open_csv

R Documentation

Create a connection to a comma separated value (CSV) file.

Description

A connection to the file filename is created. Column types have to be specified. These are not determined automatically as for example read.csv does. This has been done to increase speed.

Usage

laf_open_csv(
  filename,
  column_types,
  column_names = paste("V", seq_len(length(column_types)), sep = ""),
  sep = ",",
  dec = ".",
  trim = FALSE,
  skip = 0,
  ignore_failed_conversion = FALSE
)

Arguments

`filename`	character containing the filename of the CSV-file
`column_types`	character vector containing the types of data in each of the columns. Valid types are: double, integer, categorical and string.
`column_names`	optional character vector containing the names of the columns.
`sep`	optional character specifying the field separator used in the file.
`dec`	optional character specifying the decimal mark.
`trim`	optional logical specifying whether or not white space at the end of factor levels or character strings should be trimmed.
`skip`	optional numeric specifying the number of lines at the beginning of the file that should be skipped.
`ignore_failed_conversion`	ignore (set to `NA`) fields that could not be converted.

Details

After the connection is created data can be extracted using indexing (as in a normal data.frame) or methods such as read_lines and next_block can be used to read in blocks. For processing the file in blocks the convenience function process_blocks can be used.

The CSV-file should not contain headers. Use the skip option to skip any headers.

In case of an incomplete line (at line with less columns than it should have): when the line is completely empty the reader stops at that point and considers that as the end of the file. In other cases a warning is issued and the remaining columns are considered empty. For character columns this results in an empty string for numeric columns a NA.

Value

Object of type laf. Values can be extracted from this object using indexing, and methods such as read_lines, next_block.

Examples

# Create temporary filename
tmpcsv  <- tempfile(fileext="csv")

# Generate test data
ntest <- 10
column_types <- c("integer", "integer", "double", "string")
testdata <- data.frame(
    a = 1:ntest,
    b = sample(1:2, ntest, replace=TRUE),
    c = round(runif(ntest), 13),
    d = sample(c("jan", "pier", "tjores", "corneel"), ntest, replace=TRUE)
    )
# Write test data to csv file
write.table(testdata, file=tmpcsv, row.names=FALSE, col.names=FALSE, sep=',')

# Create LaF-object
laf <- laf_open_csv(tmpcsv, column_types=column_types)

# Read from file using indexing
first_column <- laf[ , 1]
first_row    <- laf[1, ]

# Read from file using blockwise operators
begin(laf)
first_block <- next_block(laf, nrows=2)
second_block <- next_block(laf, nrows=2)

# Cleanup
file.remove(tmpcsv)

LaF documentation built on April 4, 2025, 5:47 a.m.