mash_colnames: Make many header rows into column names

View source: R/mash_colnames.R

mash_colnamesR Documentation

Make many header rows into column names

Description

Make many header rows into column names

Usage

mash_colnames(
  df,
  n_name_rows,
  keep_names = TRUE,
  sliding_headers = FALSE,
  sep = "_"
)

Arguments

df

A data.frame or tibble object in which the names are broken up across the top n rows.

n_name_rows

Number of rows at the top of the data to be used to create the new variable (column) names. Must be >= 1.

keep_names

If TRUE, existing names will be included in building the new variable names. Defaults to TRUE.

sliding_headers

If TRUE, empty values in the first (topmost) header header row be filled column-wise. Defaults to FALSE. See details.

sep

Character string to separate the unified values (default is underscore).

Details

Tables are often shared with the column names broken up across the first few rows. This function takes the number of rows at the top of a table that hold the broken up names and whether or not to include the names, and mashes the values column-wise into a single string for each column. The keep_names argument can be helpful for tables we imported using a skip argument. If keep_names is set to FALSE, adjust the value of n_name_rows accordingly.

This function will throw a warning when possible NA values end up in the variable names. sliding_headers can be used for tables with ragged names in which not every column has a value in the very first row. In these cases attribution by adjacency is assumed, and when sliding_headers is set to TRUE the names in the topmost row are filled row-wise. This can be useful for tables reporting survey data or experimental designs in an untidy manner.

Value

The original data frame, but with new column names and without the top n rows that held the broken up names.

Author(s)

This function was originally contributed by Jarrett Byrnes through a GitHub issue.

Examples

babies <-
  data.frame(
    stringsAsFactors = FALSE,
    Baby = c(NA, NA, "Angie", "Yean", "Pierre"),
    Age = c("in", "months", "11", "9", "7"),
    Weight = c("kg", NA, "2", "3", "4"),
    Ward = c(NA, NA, "A", "B", "C")
  )
# Including the object names
mash_colnames(babies, n_name_rows = 2, keep_names = TRUE)

babies_skip <-
  data.frame(
    stringsAsFactors = FALSE,
    X1 = c("Baby", NA, NA, "Jennie", "Yean", "Pierre"),
    X2 = c("Age", "in", "months", "11", "9", "7"),
    X3 = c("Hospital", NA, NA, "A", "B", "A")
  )
#' # Discarding the automatically-generated names (X1, X2, etc...)
mash_colnames(babies_skip, n_name_rows = 3, keep_names = FALSE)

fish_experiment <-
  data.frame(
    stringsAsFactors = FALSE,
    X1 = c("Sample", NA, "Pacific", "Atlantic", "Freshwater"),
    X2 = c("Larvae", "Control", "12", "11", "10"),
    X3 = c(NA, "Low Dose", "11", "12", "8"),
    X4 = c(NA, "High Dose", "8", "7", "9"),
    X5 = c("Adult", "Control", "13", "13", "8"),
    X6 = c(NA, "Low Dose", "13", "12", "7"),
    X7 = c(NA, "High Dose", "10", "10", "9")
  )
# Ragged names
mash_colnames(fish_experiment,
  n_name_rows = 2,
  keep_names = FALSE, sliding_headers = TRUE
)

luisDVA/unheadr documentation built on Aug. 16, 2022, 5:28 a.m.