high_low_test: Performs high to low digit tests vs probability of high to...

View source: R/high_low_test.R

high_low_testR Documentation

Performs high to low digit tests vs probability of high to low digits by Benford's Law via chi square test (default) or binomial test

Description

Performs high to low digit tests vs probability of high to low digits by Benford's Law via chi square test (default) or binomial test

Usage

high_low_test(
  digitdata,
  data_columns = "all",
  high = c(6, 7, 8, 9),
  omit_05 = NA,
  test_type = "chisq",
  distribution = "Benford",
  contingency_table = NA,
  skip_first_digit = FALSE,
  skip_last_digit = FALSE,
  break_out = NA,
  break_out_grouping = NA,
  category = NA,
  category_grouping = NA,
  plot = TRUE,
  remove_all_category_visualize = FALSE
)

Arguments

digitdata

A object of class DigitAnalysis.

data_columns

The names of numeric columns of data to be analyzed. Default can be 'all', where using all data columns in numbers df in digitdata; an array of column names, as characters; a single column name, as character.

high

An numeric array of digits or a single number that will be classified as high digits. Defaulted to c(6,7,8,9).

omit_05

Whether to omit 0 or both 0 and 5. If omit both 0 and 5, pass in c(0,5) or c(5,0); if omit only 0 pass in 0 or c(0); if omit neither, pass in NA. Default to NA.

test_type

Specifies whether to perform a binomial test on high vs low digit frequency weighted averaged across digit places with "binom", or a chi square test on high vs. low by each digit place with "chisq". Defaulted to "chisq".

distribution

'Benford' or 'Uniform'. Case insensitive. Specifies the distribution the chi square test is testing against. Default to 'Benford'.

contingency_table

The user-input probability table of arbitrary distribution. Overwrites distribution if not NA. Must be a dataframe of the form as benford_table. Defaulted to NA.

  • Check out load(file = "data/benford_table.RData") to see the format of benford_table

skip_first_digit

TRUE or FALSE: If TRUE, skip first digit place before analysis. Default to FALSE.

skip_last_digit

TRUE or FALSE: If TRUE, skip last digit place before analysis, since we don't want tests to overlap. Default to FALSE. skip_last_digit should overwrite digit_places and skip_first_digits.

break_out
  • The data column (non-numeric!) to split up the dataset based on different categories in the column if specified as an character.

  • The first division (usually x-axis) shown in plots.

  • Default to NA.

break_out_grouping

A list of arrays, or defaulted to NA. Only effective if break_out is not NA.

  • Each the names of the elements in the list is the break_out name

  • Each array contains the values belonging to that break_out

  • If it is remain as NA as default, while break_out is not NA, then break_out_grouping will default to every individual item in break_out will be in a separate group.

category

The column for splitting the data into sectors for separate analysis. The second division (usually variables) shown in plots.

category_grouping

A list of arrays, or defaulted to NA. Only effective if category is not NA.

  • Each the names of the elements in the list is the category name

  • Each array contains the values belonging to that category

  • If it is remain as NA as default, while category is not NA, then category_grouping will default to every individual item in category will be in a separate group.

  • e.g. category_grouping = list(group_1=c(category_1, category_2, ...), group_2=c(category_10, ...), group_3=c(...))

plot

TRUE or FALSE or 'Save': If TRUE, display the plots and return them. If 'Save', return the plots but suppress display. If FALSE, no plot is produced. Default to TRUE.

remove_all_category_visualize

TRUE or FALSE: If TRUE, remove visualization of 'All Category' dataset for plots

Value

  • A table of p-values for high low test on each category

  • A table of sample sizes for high low test on each category

  • Plots for each category if plot = TRUE or 'Save'

Examples

high_low_test(digitdata, high=c(5, 6,7,8,9))
high_low_test(digitdata, skip_first_digit=TRUE, break_out='col_name', test_type='binom')
high_low_test(digitdata, high=c(5,6,9), omit_05=0, skip_last_digit=TRUE, break_out='col_name', category='category_name')
high_low_test(digitdata, data_columns='c(col_name1, col_name2)', high=9, break_out='col_name', category='category_name', plot='Save')

jlederluis/digitanalysis documentation built on Nov. 5, 2023, 11:46 a.m.