cols_regex: Create readr column specification using regular expression...

cols_regexR Documentation

Create readr column specification using regular expression matching

Description

Allows to define a regular expression per desired column specification object matching the respective column names.

Usage

cols_regex(..., .col_names, .default = readr::col_character())

Arguments

...

Named arguments where the names are (Perl-compatible) regular expressions and the values are column objects created by ⁠col_*()⁠, or their abbreviated character names (as described in the col_types parameter of readr::read_delim()). Dynamic dots are supported.

.col_names

Column names which should be matched by ....

.default

Any named columns not matched by any of the regular expressions in ... will be read with this column type.

Value

A column specification.

Examples

library(magrittr)

# for some hypothetical CSV data column names like these...
col_names <- c("VAR1_Text",
               "VAR2_Text",
               "VAR3_Text_Other",
               "VAR1_Code_R1",
               "VAR2_Code_R2",
               "HAS_R1_Lag",
               "HAS_R2_Lag",
               "GARBAGEX67",
               "GARBAGEY09")

# ...a column spec could be created concisely as follows:
col_regex <- list("_Text(_|$)" = "c",
                  "_Code(_|$)" = "i",
                  "^GARBAGE"   = readr::col_skip())

pal::cols_regex(.col_names = col_names,
                !!!col_regex,
                .default = "l")

# we can parse some real data:
url <- "https://salim_b.gitlab.io/misc/Kantonsratswahl_Zuerich_2019_Ergebnisse_Gemeinden.csv"

raw_data <-
  httr2::request(url) |>
  httr2::req_perform() |>
  httr2::resp_body_string()

col_spec <- pal::cols_regex("^(Gemeindenamen|Liste|Wahlkreis)$" = "c",
                            "(?i)anteil" = "d",
                            .default = "i",
                            .col_names = pal::dsv_colnames(raw_data))

print(col_spec)

readr::read_csv(file = raw_data,
                col_types = col_spec)

# to process the same data without first downloading it to disk, use `readr::type_convert()`:
readr::read_csv(file = url,
                col_types = list(.default = "c")) %>%
  readr::type_convert(col_types = col_spec)

salim-b/pal documentation built on Feb. 28, 2025, 6:51 p.m.