writeutf8: Save data frames as UTF-8 encoded plain text

Description Usage Arguments Motivation Output format See Also Examples

View source: R/writeutf8.R

Description

writeutf8 saves a data frame as a UTF-8 encoded plain text file and readutf8 reads it back into R.

Usage

1
2
3
writeutf8(df, filename, sep = "\t", na = "NA", eol = "\r\n")

readutf8(filename, stringsAsFactors = FALSE, ...)

Arguments

df

a data frame

filename

the name of the output file

sep

the field separator string

na

the string to use for missing values in the data

eol

the character(s) to print at the end of each row

stringsAsFactors

Convert character vectors to factors?

...

further arguments passed to read.table

Motivation

Under Linux or macOS, write.table can be used to save data frames as UTF-8 encoded plain text. Under Microsoft Windows, this approach no longer works because Windows does not support UTF-8 locales. writeutf8 solves this problem.

Output format

If the only purpose of writing data to disk using writeutf8 is to read it back into R later, then the exact output format is irrelevant; just use readutf8 to read the data frame back into R. In this case, no customization of the output format is necessary and the code for writing and reading will be very concise: save with writeutf(df, filename) and read with readutf8(filename).

The only time when the output format matters is when the written out data has to be read by another software that expects a certain input format.

The data frame written by writeutf8 contains a header row with column names. All columns, including the column names, are quoted with double quotes. This allows character columns and column names to have arbitrary content. Double quotes embedded in character columns and column names are doubled. Line endings default to the Windows convention of "\r\n" because most users of writeutf8 will be working under Windows. Row names are not included in the output.

See Also

write.table, writeLines, read.table, scan, Encoding

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
## Not run: 
df <- data.frame(    # ascii  latin1  UTF-8
    w = c(NA,   "",    "abc", "\xd8", "\u9B3C"), # character
    x = c(1L,   NA,    3L,    4L,     5L),       # integer
    y = c(1.5,  2.5,   NA,    4.5,    5.5),      # double
    z = c(TRUE, FALSE, TRUE,  NA,     TRUE),     # logical
    t = as.POSIXct("2021-01-01 15:30:45"),       # POSIXct
    d = as.Date("2021-01-01"))                   # Date

Encoding(df$w) <- c(rep("unknown", 3), "latin1", "UTF-8")

writeutf8(df, "data.tsv")

# Then later in another R script:
df <- readutf8("data.tsv")

## End(Not run)

cbaumbach/writeutf8 documentation built on Jan. 9, 2021, 3:19 p.m.