| read_fwf | R Documentation |
Fixed-width files store tabular data with each field occupying a specific range of character positions in every line. Once the fields are identified, converting them to the appropriate R types works just like for delimited files. The unique challenge with fixed-width files is describing where each field begins and ends. readr tries to ease this pain by offering a few different ways to specify the field structure:
fwf_empty() - Guesses based on the positions of empty columns. This is
the default. (Note that fwf_empty() returns 0-based positions, for
internal use.)
fwf_widths() - Supply the widths of the columns.
fwf_positions() - Supply paired vectors of start and end positions. These
are interpreted as 1-based positions, so are off-by-one compared to the
output of fwf_empty().
fwf_cols() - Supply named arguments of paired start and end positions or
column widths.
Note: fwf_empty() cannot work with a connection or with any of the input
types that involve a connection internally, which includes remote and
compressed files. The reason is that this would necessitate reading from the
connection twice. In these cases, you'll have to either provide the field
structure explicitly with another fwf_*() function or download (and
decompress, if relevant) the file first.
read_fwf(
file,
col_positions = fwf_empty(file, skip, n = guess_max),
col_types = NULL,
col_select = NULL,
id = NULL,
locale = default_locale(),
na = c("", "NA"),
comment = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(n_max, 1000),
progress = show_progress(),
name_repair = "unique",
num_threads = readr_threads(),
show_col_types = should_show_types(),
lazy = should_read_lazy(),
skip_empty_rows = TRUE
)
fwf_empty(
file,
skip = 0,
skip_empty_rows = deprecated(),
col_names = NULL,
comment = "",
n = 100L
)
fwf_widths(widths, col_names = NULL)
fwf_positions(start, end = NULL, col_names = NULL)
fwf_cols(...)
file |
Either a path to a file, a connection, or literal data (either a single string or a raw vector). Files ending in Literal data is most useful for examples and tests. To be recognised as
literal data, wrap the input with Using a value of |
col_positions |
Column positions, as created by |
col_types |
One of If Column specifications created by Alternatively, you can use a compact string representation where each character represents one column:
By default, reading a file without a column specification will print a
message showing what |
col_select |
Columns to include in the results. You can use the same
mini-language as |
id |
The name of a column in which to store the file path. This is
useful when reading multiple input files and there is data in the file
paths, such as the data collection date. If |
locale |
The locale controls defaults that vary from place to place.
The default locale is US-centric (like R), but you can use
|
na |
Character vector of strings to interpret as missing values. Set this
option to |
comment |
A string used to identify comments. Any text after the comment characters will be silently ignored. |
trim_ws |
Should leading and trailing whitespace (ASCII spaces and tabs) be trimmed from each field before parsing it? |
skip |
Number of lines to skip before reading data. |
n_max |
Maximum number of lines to read. |
guess_max |
Maximum number of lines to use for guessing column types.
Will never use more than the number of lines read.
See |
progress |
Display a progress bar? By default it will only display
in an interactive session and not while knitting a document. The automatic
progress bar can be disabled by setting option |
name_repair |
Handling of column names. The default behaviour is to
ensure column names are
This argument is passed on as |
num_threads |
The number of processing threads to use for initial
parsing and lazy reading of data. If your data contains newlines within
fields the parser should automatically detect this and fall back to using
one thread only. However if you know your file has newlines within quoted
fields it is safest to set |
show_col_types |
If |
lazy |
Read values lazily? By default, this is Learn more in |
skip_empty_rows |
Should blank rows be ignored altogether? i.e. If this
option is |
col_names |
Either NULL, or a character vector column names. |
n |
Number of lines the tokenizer will read to determine file structure. By default it is set to 100. |
widths |
Width of each field. Use |
start, end |
Starting and ending (inclusive) positions of each field.
Positions are 1-based: the first character in a line is at position 1.
Use |
... |
If the first element is a data frame,
then it must have all numeric columns and either one or two rows.
The column names are the variable names. The column values are the
variable widths if a length one vector, and if length two, variable start and end
positions. The elements of |
Here's a enhanced example using the contents of the file accessed via
readr_example("fwf-sample.txt").
1 2 3 4 123456789012345678901234567890123456789012 [ name 20 ][state 10][ ssn 12 ] John Smith WA 418-Y11-4111 Mary Hartford CA 319-Z19-4341 Evan Nolan IL 219-532-c301
Here are some valid field specifications for the above (they aren't all equivalent! but they are all valid):
fwf_widths(c(20, 10, 12), c("name", "state", "ssn"))
fwf_positions(c(1, 30), c(20, 42), c("name", "ssn"))
fwf_cols(state = c(21, 30), last = c(6, 20), first = c(1, 4), ssn = c(31, 42))
fwf_cols(name = c(1, 20), ssn = c(30, 42))
fwf_cols(name = 20, state = 10, ssn = 12)
Comments are now only ignored if they appear at the start of a line. Comments elsewhere in a line are no longer treated specially.
read_table() to read fixed width files where each
column is separated by whitespace.
fwf_sample <- readr_example("fwf-sample.txt")
writeLines(read_lines(fwf_sample))
# You can specify column positions in several ways:
# 1. Guess based on position of empty columns
read_fwf(fwf_sample, fwf_empty(fwf_sample, col_names = c("first", "last", "state", "ssn")))
# 2. A vector of field widths
read_fwf(fwf_sample, fwf_widths(c(20, 10, 12), c("name", "state", "ssn")))
# 3. Paired vectors of start and end positions
read_fwf(fwf_sample, fwf_positions(c(1, 30), c(20, 42), c("name", "ssn")))
# 4. Named arguments with start and end positions
read_fwf(fwf_sample, fwf_cols(name = c(1, 20), ssn = c(30, 42)))
# 5. Named arguments with column widths
read_fwf(fwf_sample, fwf_cols(name = 20, state = 10, ssn = 12))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.