unpack_round_numbers_test: Performs unpack rounded number test by performing all-digit...

View source: R/unpack_round_numbers_test.R

unpack_round_numbers_testR Documentation

Performs unpack rounded number test by performing all-digit place two-way chi square test vs Benford’s Law. A wrapper function for all_digit_test.

Description

Performs unpack rounded number test by performing all-digit place two-way chi square test vs Benford’s Law. A wrapper function for all_digit_test.

Usage

unpack_round_numbers_test(
  digitdata,
  rounding_split_column,
  analysis_columns = "all",
  digit_places = "all",
  break_out = NA,
  break_out_grouping = NA,
  category = NA,
  category_grouping = NA,
  distribution = "Benford",
  contingency_table = NA,
  plot = TRUE,
  omit_05 = NA,
  skip_first_digit = FALSE,
  skip_last_digit = FALSE,
  suppress_low_N = FALSE,
  suppress_first_division_plots = FALSE,
  suppress_second_division_plots = TRUE
)

Arguments

digitdata

A object of class DigitAnalysis.

rounding_split_column

The data column (numeric!) to split rounded and unrounded digits upon to perform unpacking rounding test.

analysis_columns

The names of numeric columns of data to be analyzed. Default can be 'all', where using all data columns in numbers df in digitdata; an array of column names, as characters; a single column name, as character.

digit_places

The indexes of left-aligned digit places to analyze. There are three options:

  • 'all': analzye all digits. Set as default.

  • An numeric array: Perform multiple digit test on the digit places desired to analyze.

  • A number: Perform single digit test on the digit place desired to analyze. If last digit test is desired, pass in -1 or c(-1).

break_out
  • The data column (non-numeric!) to split up the dataset based on different categories in the column if specified as an character.

  • The first division (usually x-axis) shown in plots.

  • Default to NA.

break_out_grouping

A list of arrays, or defaulted to NA. Only effective if break_out is not NA.

  • Each the names of the elements in the list is the break_out name

  • Each array contains the values belonging to that break_out

  • If it is remain as NA as default, while break_out is not NA, then break_out_grouping will default to every individual item in break_out will be in a separate group.

category

The column for splitting the data into sectors for separate analysis. The second division (usually variables) shown in plots.

category_grouping

A list of arrays, or defaulted to NA. Only effective if category is not NA.

  • Each the names of the elements in the list is the category name

  • Each array contains the values belonging to that category

  • If it is remain as NA as default, while category is not NA, then category_grouping will default to every individual item in category will be in a separate group.

  • e.g. category_grouping = list(group_1=c(category_1, category_2, ...), group_2=c(category_10, ...), group_3=c(...))

distribution

'Benford' or 'Uniform'. Case insensitive. Specifies the distribution the chi square test is testing against. Default to 'Benford'.

contingency_table

The user-input probability table of arbitrary distribution. Overwrites distribution if not NA. Must be a dataframe of the form as benford_table. Defaulted to NA.

  • Check out load(file = "data/benford_table.RData") to see the format of benford_table

plot

TRUE or FALSE or 'Save': If TRUE, display the plots and return them. If 'Save', return the plots but suppress display. If FALSE, no plot is produced. Default to TRUE.

omit_05

Whether to omit 0 or both 0 and 5. If omit both 0 and 5, pass in c(0,5) or c(5,0); if omit only 0 pass in 0 or c(0); if omit neither, pass in NA. Default to NA.

skip_first_digit

TRUE or FALSE: If TRUE, skip first digit place before analysis. Default to FALSE.

skip_last_digit

TRUE or FALSE: If TRUE, skip last digit place before analysis, since we don't want tests to overlap. Default to FALSE. skip_last_digit should overwrite digit_places and skip_first_digits.

suppress_low_N

TRUE or FALSE: If TRUE, suppress columns in expected table if at least one cell in that column has expected value < 5. Default to FALSE.

suppress_first_division_plots

TRUE or FALSE: If TRUE, suppress the display of all plots on first and second division. If TRUE, suppress_second_division_plots will also be set to TRUE.

suppress_second_division_plots

TRUE or FALSE: If TRUE, suppress the display of all plots on second division.

Value

  • A list of p-values for round and unround data break by break_out and category if specified

  • A list of sample sizes for round and unround data break by break_out and category if specified

  • A list of merged plots, rounded data plots, and un rounded data plots break by break_out and category if specified iff plot = TRUE or 'Save'

Examples

unpack_round_numbers_test(digitdata, rounding_split_column='col_name', analysis_columns=c('X', 'Y'))
unpack_round_numbers_test(digitdata, rounding_split_column='col_name', digit_places=c(1,2,3), break_out='A', category='Y')
unpack_round_numbers_test(digitdata, rounding_split_column='col_name', break_out='A', omit_05=c(0,5), suppress_low_N=TRUE)

jlederluis/digitanalysis documentation built on Nov. 5, 2023, 11:46 a.m.