fwrite_ | R Documentation |
As write.csv
but much faster (e.g. 2 seconds versus 1 minute) and just as flexible. Modern machines almost surely have more than one CPU so fwrite
uses them; on all operating systems including Linux, Mac and Windows.
fwrite_(x, file, sep = ",", na = "NA", col.names = TRUE, ...)
x |
Any |
file |
Output file name. |
sep |
The separator between columns. Default is |
na |
The string to use for missing values in the data. Default is a blank string |
col.names |
Should the column names (header row) be written? The default is |
... |
Extra arguments to data.table::fwrite |
fwrite
began as a community contribution with pull request #1613 by Otto Seiskari. This gave Matt Dowle the impetus to specialize the numeric formatting and to parallelize: https://h2o.ai/blog/fast-csv-writing-for-r/. Final items were tracked in issue #1664 such as automatic quoting, bit64::integer64
support, decimal/scientific formatting exactly matching write.csv
between 2.225074e-308 and 1.797693e+308 to 15 significant figures, row.names
, dates (between 0000-03-01 and 9999-12-31), times and sep2
for list
columns where each cell can itself be a vector.
To save space, fwrite
prefers to write wide numeric values in scientific notation – e.g. 10000000000
takes up much more space than 1e+10
. Most file readers (e.g. fread
) understand scientific notation, so there's no fidelity loss. Like in base R, users can control this by specifying the scipen
argument, which follows the same rules as options('scipen')
. fwrite
will see how much space a value will take to write in scientific vs. decimal notation, and will only write in scientific notation if the latter is more than scipen
characters wider. For 10000000000
, then, 1e+10
will be written whenever scipen<6
.
CSVY Support:
The following fields will be written to the header of the file and surrounded by ---
on top and bottom:
source
- Contains the R version and data.table
version used to write the file
creation_time_utc
- Current timestamp in UTC time just before the header is written
schema
with element fields
giving name
-type
(class
) pairs for the table; multi-class objects (e.g. c('POSIXct', 'POSIXt')
) will have their first class written.
header
- same as col.names
(which is header
on input)
sep
sep2
eol
na.strings
- same as na
dec
qmethod
logical01
https://howardhinnant.github.io/date_algorithms.html
https://en.wikipedia.org/wiki/Decimal_mark
data.table::fwrite()
DF = data.frame(A=1:3, B=c("foo","A,Name","baz"))
fwrite(DF)
write.csv(DF, row.names=FALSE, quote=FALSE) # same
fwrite(DF, row.names=TRUE, quote=TRUE)
write.csv(DF) # same
DF = data.frame(A=c(2.1,-1.234e-307,pi), B=c("foo","A,Name","bar"))
fwrite(DF, quote='auto') # Just DF[2,2] is auto quoted
write.csv(DF, row.names=FALSE) # same numeric formatting
DT = data.table(A=c(2,5.6,-3),B=list(1:3,c("foo","A,Name","bar"),round(pi*1:3,2)))
fwrite(DT)
fwrite(DT, sep="|", sep2=c("{",",","}"))
## Not run:
set.seed(1)
DT = as.data.table( lapply(1:10, sample,
x=as.numeric(1:5e7), size=5e6)) # 382MB
system.time(fwrite(DT, "/dev/shm/tmp1.csv")) # 0.8s
system.time(write.csv(DT, "/dev/shm/tmp2.csv", # 60.6s
quote=FALSE, row.names=FALSE))
system("diff /dev/shm/tmp1.csv /dev/shm/tmp2.csv") # identical
set.seed(1)
N = 1e7
DT = data.table(
str1=sample(sprintf("%010d",sample(N,1e5,replace=TRUE)), N, replace=TRUE),
str2=sample(sprintf("%09d",sample(N,1e5,replace=TRUE)), N, replace=TRUE),
str3=sample(sapply(sample(2:30, 100, TRUE), function(n)
paste0(sample(LETTERS, n, TRUE), collapse="")), N, TRUE),
str4=sprintf("%05d",sample(sample(1e5,50),N,TRUE)),
num1=sample(round(rnorm(1e6,mean=6.5,sd=15),2), N, replace=TRUE),
num2=sample(round(rnorm(1e6,mean=6.5,sd=15),10), N, replace=TRUE),
str5=sample(c("Y","N"),N,TRUE),
str6=sample(c("M","F"),N,TRUE),
int1=sample(ceiling(rexp(1e6)), N, replace=TRUE),
int2=sample(N,N,replace=TRUE)-N/2
) # 774MB
system.time(fwrite(DT,"/dev/shm/tmp1.csv")) # 1.1s
system.time(write.csv(DT,"/dev/shm/tmp2.csv", # 63.2s
row.names=FALSE, quote=FALSE))
system("diff /dev/shm/tmp1.csv /dev/shm/tmp2.csv") # identical
unlink("/dev/shm/tmp1.csv")
unlink("/dev/shm/tmp2.csv")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.