Guesses the formatting of tabular/flat files by testing different options for formatting the delimiter, decimal mark, grouping mark and column header. Chooses the combinations that creates a data frame that make most sense, based on the assumptions:

For delimiters following are possible: tab (\t), comma (,), semicolon (;) and whitespace/fixed width. As decimal mark: comma (,) and dot (.). As big number grouping mark: comma (,), dot (.) and space ( ) are tested. Column headers existence is also tested for, altogether 22 possible formatting combinations are tested.

This Shiny app demonstrates how this package (together with can be used to create an tabular/flat file reader that you can throw almost any file at - and it will in most cases guess the right formatting for you.

Quick start


Dev version from GitHub.



# Print all formatting combinations that are currently tested
#> # A tibble: 22 × 5
#>     name delim decimal_mark grouping_mark col_names
#>    <chr> <chr>        <chr>         <chr>     <lgl>
#> 1    csv     ,            .                    TRUE
#> 2   csv2     ;            ,             .      TRUE
#> 3   csv3     ;            ,                    TRUE
#> 4   csv4     ;            .             ,      TRUE
#> 5   csv5     ;            .                    TRUE
#> 6    tsv     \t            .             ,      TRUE
#> 7   tsv2     \t            .                    TRUE
#> 8   tsv3     \t            ,             .      TRUE
#> 9   tsv4     \t            ,                    TRUE
#> 10   wsp                  .             ,      TRUE
#> # ... with 12 more rows

# Create a date frame and format it into a semicolon delimited string
test_str <- readr::format_delim(
  data.frame(a=runif(1000, -100, 100), b="a", c=1, d=TRUE), 

# Read the string and guess the right formatting
#> Delimiter: ';', decimal mark: '.', grouping mark: ',', column headers: TRUE
#> # A tibble: 1,000 × 4
#>              a     b     c     d
#>          <dbl> <chr> <dbl> <lgl>
#> 1    0.5710977     a     1  TRUE
#> 2  -68.7795925     a     1  TRUE
#> 3  -51.5204403     a     1  TRUE
#> 4   38.9085193     a     1  TRUE
#> 5   82.8925392     a     1  TRUE
#> 6   68.2881441     a     1  TRUE
#> 7   69.1850611     a     1  TRUE
#> 8  -71.8340796     a     1  TRUE
#> 9  -42.3778389     a     1  TRUE
#> 10  92.0550250     a     1  TRUE
#> # ... with 990 more rows

