profile_table: High-level column statistics for data.frames

View source: R/profile_table.R

profile_tableR Documentation

High-level column statistics for data.frames

Description

Calculate pre-determind per-column stats

Usage

profile_table(tbl, pivot = TRUE)

Arguments

tbl

A data.frame or data.table.

pivot

Output in cross-tabular format if TRUE (default)

Details

Useful to check high-level column statistics prior to e.g. creating a table schema.

Value

By default, a data.table with the following fields:

  • field_name: factor; input field names.

  • CLASS: chr; the class of the field. If multiple classes, these are collapsed with a semicolon delimiter.

  • MAYBE_NUMBER: logi; does the field contain only numbers, such that even upon coercion of non-numeric fields, no NA values would result? Always TRUE for numeric (or integer) fields

  • FRAC_COMPLETE: numeric; the fraction of rows that are not NA

  • NCHAR_MAX_LEN: integer; the maximum character length of the field, after coercing to character.

  • UNIQUEN: integer; the distinct count of values, excluding NA

  • INTEGRAL_DUPE_FCTR; integer; the fraction of duplicate values, only if the result of dividing the distinct count of non-NA values by the row count is an integral value. NA if this is not true, i.e. if the modulo of the calculation != 0.

Note

factor columns are treated as character via as.character()

Examples

set.seed(10)
int_sample <- sample(1:10L, 100, replace = TRUE)
test_df <- data.frame(
  num_col = rnorm(100),
  chr_col = sample(LETTERS, 100, replace = TRUE),
  int_col = int_sample,
  int_as_factor = as.factor(int_sample),
  int_as_chr = as.character(int_sample),
  all_NA_chr = NA_character_,
  posix_ct_t = as.POSIXct(as.Date("2001-01-01")),
  stringsAsFactors = FALSE
)

profile_table(test_df, pivot = TRUE)

slin30/wzMisc documentation built on Jan. 27, 2023, 1 a.m.