Utility Functions

cols_regex

R Documentation

Create readr column specification using regular expression matching

Description

Allows to define a regular expression per desired column specification object matching the respective column names.

Usage

cols_regex(..., .col_names, .default = readr::col_character())

Arguments

`...`	Named arguments where the names are (Perl-compatible) regular expressions and the values are column objects created by `⁠col_*()⁠`, or their abbreviated character names (as described in the `col_types` parameter of `readr::read_delim()`). Dynamic dots are supported.
`.col_names`	Column names which should be matched by `...`.
`.default`	Any named columns not matched by any of the regular expressions in `...` will be read with this column type.

Details

The main limitation of cols_regex() is that it needs to know the input dataset's full set of .col_names in advance, for which dsv_colnames() can help. See the examples for further details.

Value

A column specification.

Examples

library(magrittr)

# for some hypothetical CSV data column names like these...
col_names <- c("VAR1_Text",
               "VAR2_Text",
               "VAR3_Text_Other",
               "VAR1_Code_R1",
               "VAR2_Code_R2",
               "HAS_R1_Lag",
               "HAS_R2_Lag",
               "GARBAGEX67",
               "GARBAGEY09")

# ...a column spec could be created concisely as follows:
col_regex <- list("_Text(_|$)" = "c",
                  "_Code(_|$)" = "i",
                  "^GARBAGE"   = readr::col_skip())

pal::cols_regex(.col_names = col_names,
                !!!col_regex,
                .default = "l")

# we can parse some real data:
url <- "https://salim_b.gitlab.io/misc/Kantonsratswahl_Zuerich_2019_Ergebnisse_Gemeinden.csv"

raw_data <-
  httr2::request(url) |>
  httr2::req_perform() |>
  httr2::resp_body_string()

col_spec <- pal::cols_regex("^(Gemeindenamen|Liste|Wahlkreis)$" = "c",
                            "(?i)anteil" = "d",
                            .default = "i",
                            .col_names = pal::dsv_colnames(raw_data))
print(col_spec)

readr::read_csv(file = raw_data,
                col_types = col_spec)

# we can also do basically the same in a more concise way without having to rely on
# `pal::dsv_colnames()`:
readr::read_csv(file = url,
                col_types = list(.default = "c")) %>%
  readr::type_convert(col_types = pal::cols_regex("^(Gemeindenamen|Liste|Wahlkreis)$" = "c",
                                                  "(?i)anteil" = "d",
                                                  .default = "i",
                                                  .col_names = colnames(.)))

salim-b/pal documentation built on June 9, 2025, 12:39 a.m.