Home

/

GitHub

/

csvread: Fast Specialized CSV File Loader.

csvread: Fast Specialized CSV File Loader.
In collectivemedia/csvread: Fast Specialized CSV File Loader.

Description Usage Arguments Details Value Maintainer Copyright License URL Installation from github Author(s) See Also Examples

View source: R/csvread.R

Package csvread contains a fast specialized CSV and other delimited file loader, and a basic 64-bit integer class to aid in reading 64-bit integer values.

Given a list of the column types, function csvread parses the CSV file and returns a data frame.

map.coltypes guesses the column types in the CSV file by reading the first nrows lines. The result can be passed to csvread as the coltypes argument.

csvread(file, coltypes, header, colnames = NULL, nrows = NULL,
  verbose = FALSE, delimiter = ",")

map.coltypes(file, header, nrows = 100, delimiter = ",")

`file`	Path to the CSV file.
`coltypes`	A vector of column types, e.g., `c("integer", "string")`. The accepted types are "integer", "double", "string", "long" and "longhex". `integer` - the column is parsed into an R integer type (32 bit) `double` - the column is parsed into an R double type `string` - the column is loaded as character type `long` - the column is interpreted as the decimal representation of a 64-bit integer, stored as a double and assigned the `int64` class. `longhex` - the column is interpreted as the hex representation of a 64-bit integer, stored as a double and assigned the `int64` class with an additional attribute `base = 16L` that is used for printing. `integer64` - same as `long` but produces a column of class `integer64`, which should be compatible with package `bit64` (untested). `verbose` - if `TRUE`, the function prints number of lines counted in the file. `delimiter` - a single character delimiter, defalut is `","`.
`header`	TRUE (default) or FALSE; indicates whether the file has a header and serves as the source of column names if `colnames` is not provided.
`colnames`	Optional column names for the resulting data frame. Overrides the header, if header is present. If NULL, then the column names are taken from the header, or, if there is no header, the column names are set to 'COL1', 'COL2', etc.
`nrows`	If NULL, the function first counts the lines in the file. This step can be avoided if the number of lines is known by providing a value to `nrows`. On the other hand, `nrows` can be used to read only the first lines of the CSV file.
`verbose`	If `TRUE` and `nrows` is `NULL`, the function prints number of lines counted in the file.
`delimiter`	A single character delimiter, defalut is `","`.

csvread provides functionality for loading large (10M+ lines) CSV and other delimited files, similar to read.csv, but typically faster and using less memory than the standard R loader. While not entirely general, it covers many common use cases when the types of columns in the CSV file are known in advance. In addition, the package provides a class 'int64', which represents 64-bit integers exactly when reading from a file. The latter is useful when working with 64-bit integer identifiers exported from databases. The CSV file loader supports common column types including integer, double, string, and int64, leaving further type transformations to the user.

If number of columns, which is inferred from the number of provided coltypes, is greater than the actual number of columns, the extra columns are still created. If the number of columns is less than the actual number of columns in the file, the extra columns in the file are ignored. Commas included in double quotes will be considered part of the field, rather than a separator, but double quotes will NOT be stripped. Runaway double quotes will end at the end of the line.

See also int64 for information about dealing with 64-bit integers when loading data from CSV files.

A data frame containing the data from the CSV file.

Sergei Izrailev

Copyright (C) Collective, Inc.

Apache License, Version 2.0, available at http://www.apache.org/licenses/LICENSE-2.0

http://github.com/collectivemedia/csvread

devtools::install_github("collectivemedia/csvread")

Sergei Izrailev

int64

## Not run: 
frm <- csvread("inst/10rows.csv",
   coltypes = c("longhex", "string", "double", "integer", "long"),
   header = FALSE, nrows = 10)
frm
#               COL1       COL2     COL3 COL4 COL5
# 1  11fb89c1558c792 2011-05-06 0.150001 4970 4977
# 2  11fb89c1558c792 2011-05-06 0.150001 4970 4987
# 3  11fb89c1558c792 2011-05-06 0.150001 5200 5528
# 4  11fb89c1558c792 2011-05-06 0.150001 4970 5004
# 5  11fb89c1558c792 2011-05-06 0.150001 4970 4980
# 6  11fb89c1558c792 2011-05-06 0.150001 4970 5020
# 7  11fb89c1558c792 2011-05-06 0.150001 4970 5048
# 8  11fb89c1558c792 2011-05-06 0.150001 4970 5035
# 9  11fb89c1558c792 2011-05-06 0.150001 4970 4971
# 10 11fb89c1558c792 2011-05-06 0.150001 4970 4973

typeof(frm$COL1)
# [1] "double"
class(frm$COL1)
# [1] "int64"

typeof(frm$COL5)
# [1] "double"
class(frm$COL5)
# [1] "int64"

## End(Not run)
## Not run: 
coltypes <- map.coltypes("inst/10rows.csv", header = FALSE)
coltypes
#       V1        V2        V3        V4        V5
# "string"  "string"  "double" "integer" "integer"

frm <- csvread(file = "inst/10rows.csv", coltypes = coltypes, header = F, verbose = T)
# Counted 10 lines.

frm
#               COL1       COL2     COL3 COL4 COL5
# 1  11fb89c1558c792 2011-05-06 0.150001 4970 4977
# 2  11fb89c1558c792 2011-05-06 0.150001 4970 4987
# 3  11fb89c1558c792 2011-05-06 0.150001 5200 5528
# 4  11fb89c1558c792 2011-05-06 0.150001 4970 5004
# 5  11fb89c1558c792 2011-05-06 0.150001 4970 4980
# 6  11fb89c1558c792 2011-05-06 0.150001 4970 5020
# 7  11fb89c1558c792 2011-05-06 0.150001 4970 5048
# 8  11fb89c1558c792 2011-05-06 0.150001 4970 5035
# 9  11fb89c1558c792 2011-05-06 0.150001 4970 4971
# 10 11fb89c1558c792 2011-05-06 0.150001 4970 4973
typeof(frm$COL1)
# [1] "character"
class(frm$COL1)
# [1] "character"

typeof(frm$COL5)
# [1] "integer"
class(frm$COL5)
# [1] "integer"

frm$COL1 <- as.int64(frm$COL1, base = 16)
frm$COL1
# [1] "11fb89c1558c792" "11fb89c1558c792" "11fb89c1558c792" "11fb89c1558c792"
# [5] "11fb89c1558c792" "11fb89c1558c792" "11fb89c1558c792" "11fb89c1558c792"
# [9] "11fb89c1558c792" "11fb89c1558c792"
typeof(frm$COL1)
# [1] "double"
class(frm$COL1)
# [1] "int64"

as.character.int64(frm$COL1[1], base = 10)
# [1] "80986298828507026"

## End(Not run)

collectivemedia/csvread documentation built on May 13, 2019, 9:54 p.m.

collectivemedia/csvread index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

collectivemedia/csvread
Fast Specialized CSV File Loader.

csvread: Fast Specialized CSV File Loader.
In collectivemedia/csvread: Fast Specialized CSV File Loader.

Description

Usage

Arguments

Details

Value

Maintainer

Copyright

License

URL

Installation from github

Author(s)

See Also

Examples

Related to csvread in collectivemedia/csvread...

R Package Documentation

Browse R Packages

We want your feedback!

collectivemedia/csvread Fast Specialized CSV File Loader.

csvread: Fast Specialized CSV File Loader. In collectivemedia/csvread: Fast Specialized CSV File Loader.

Description

Usage

Arguments

Details

Value

Maintainer

Copyright

License

URL

Installation from github

Author(s)

See Also

Examples

Related to csvread in collectivemedia/csvread...

R Package Documentation

Browse R Packages

We want your feedback!

collectivemedia/csvread
Fast Specialized CSV File Loader.

csvread: Fast Specialized CSV File Loader.
In collectivemedia/csvread: Fast Specialized CSV File Loader.