e_read_df_header_span_rows: Combine multiple header rows into a column name for a text...

View source: R/e_read_df_header_span_rows.R

e_read_df_header_span_rowsR Documentation

Combine multiple header rows into a column name for a text data frame

Description

Some old text files have header rows for column labels that span multiple rows. In this case, we want to preserve those names but combine the labels into a single column name.

Usage

e_read_df_header_span_rows(
  dat_this = NULL,
  skip = 0,
  row_header_span = 1,
  row_header_span_collapse = "_"
)

Arguments

dat_this

data.frame with all text columns

skip

number of rows to skip that are not part of header rows

row_header_span

number of rows that comprise the header column names

row_header_span_collapse

character to separate each row of the header into the single column name

Details

  • When reading data from text, keep values as "text"

    • utils::read.table(..., , stringsAsFactors = FALSE)

  • When reading data from Excel, keep values as "text" and do not fix duplicate names

    • readxl::read_xlsx(..., col_types = "text", .name_repair = "minimal" )

Value

dat_this data.frame with updated columns names

Examples

# data should be text
dat_this <-
  read.csv(
    text = "
X,X,Z
a1,b1,c1
a2,b2,
a3,,
1,2,3
"
  , stringsAsFactors = FALSE
  )
dat_this |> print()

# return dataset as it is
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 0
  , row_header_span = 1
  )
# no header row (first row is data), adverse affect when two values are the same
#   and utils::read.table adds suffix of ".1", etc., to value
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 0
  , row_header_span = 0
  )
# skip first row
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 1
  , row_header_span = 1
  )
# skip first row, combine first three rows into a column header, collapse with underscore
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 1
  , row_header_span = 3
  , row_header_span_collapse = "_"
  )
# First row had multiple of same value, so ".1", ..., were appended;
#   so first remove ".1", then join header rows together
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 0
  , row_header_span = 4
  , row_header_span_collapse = "_"
  )
# First row is data, so header is row 1 and add new column names
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 0
  , row_header_span = 0
  )
# Skip 3 and rirst row is data, so add new column names
e_read_df_header_span_rows(
    dat_this        = dat_this
  , skip            = 3
  , row_header_span = 0
  )


erikerhardt/erikmisc documentation built on April 17, 2025, 10:48 a.m.