all_digits_test: Performs all-digit-place two-way chi square test vs Benford’s...

View source: R/all_digit_test_main_function.R

all_digits_testR Documentation

Performs all-digit-place two-way chi square test vs Benford’s Law

Description

Performs all-digit-place two-way chi square test vs Benford’s Law

Usage

all_digits_test(
  digitdata,
  data_columns = "all",
  digit_places = "all",
  break_out = NA,
  break_out_grouping = NA,
  category = NA,
  category_grouping = NA,
  distribution = "Benford",
  contingency_table = NA,
  plot = TRUE,
  omit_05 = NA,
  skip_first_digit = FALSE,
  skip_last_digit = FALSE,
  suppress_low_N = FALSE,
  suppress_first_division_plots = FALSE,
  suppress_second_division_plots = TRUE,
  save3Dfilename = "",
  kwargs = NA
)

Arguments

digitdata

A object of class DigitAnalysis.

data_columns

The names of numeric columns of data to be analyzed. Default can be 'all', where using all data columns in numbers df in digitdata; an array of column names, as characters; a single column name, as character.

digit_places

The indexes of left-aligned digit places to analyze. There are three options:

  • 'all': analyze all digits. Set as default.

  • An numeric array: Perform multiple digit test on the digit places desired to analyze.

  • A number: Perform single digit test on the digit place desired to analyze. If last digit test is desired, pass in -1 or c(-1).

break_out
  • The data column (non-numeric!) to split up the dataset based on different categories in the column if specified as an character.

  • The first division (usually x-axis) shown in plots.

  • Default to NA.

break_out_grouping

A list of arrays, or defaulted to NA. Only effective if break_out is not NA.

  • Each the names of the elements in the list is the break_out name

  • Each array contains the values belonging to that break_out

  • If it is remain as NA as default, while break_out is not NA, then break_out_grouping will default to every individual item in break_out will be in a separate group.

category

The column for splitting the data into sectors for separate analysis. The second division (usually variables) shown in plots.

category_grouping

A list of arrays, or defaulted to NA. Only effective if category is not NA.

  • Each the names of the elements in the list is the category name

  • Each array contains the values belonging to that category

  • If it is remain as NA as default, while category is not NA, then category_grouping will default to every individual item in category will be in a separate group.

  • e.g. category_grouping = list(group_1=c(category_1, category_2, ...), group_2=c(category_10, ...), group_3=c(...))

distribution

'Benford' or 'Uniform'. Case insensitive. Specifies the distribution the chi square test is testing against. Default to 'Benford'.

contingency_table

The user-input probability table of arbitrary distribution. Overwrites distribution if not NA. Must be a dataframe of the form as benford_table. Defaulted to NA.

  • Check out load(file = "data/benford_table.RData") to see the format of benford_table

plot

TRUE or FALSE or 'Save': If TRUE, display the plots and return them. If 'Save', return the plots but suppress display. If FALSE, no plot is produced. Default to TRUE.

omit_05

Whether to omit 0 or both 0 and 5. If omit both 0 and 5, pass in c(0,5) or c(5,0); if omit only 0 pass in 0 or c(0); if omit neither, pass in NA. Default to NA.

skip_first_digit

TRUE or FALSE: If TRUE, skip first digit place before analysis. Default to FALSE.

skip_last_digit

TRUE or FALSE: If TRUE, skip last digit place before analysis, since we don't want tests to overlap. Default to FALSE. skip_last_digit should overwrite digit_places and skip_first_digits.

suppress_low_N

TRUE or FALSE: If TRUE, suppress columns in expected table if at least one cell in that column has expected value < 5. Default to FALSE.

suppress_first_division_plots

TRUE or FALSE: If TRUE, suppress the display of all plots on first and second division. If TRUE, suppress_second_division_plots will also be set to TRUE.

suppress_second_division_plots

TRUE or FALSE: If TRUE, suppress the display of all plots on second division.

save3Dfilename

If specified, will save the 3D barplot to apdf named as the input name + break out and category specification. Defaulted to ”.

kwargs

extra parameters to pass into 3D plotting; dnt use it now, error prone! Defaulted to NA. Don't try to use it!!!!!!!!!

Value

  • A table of p-values for all digit test on each category

  • A table of sample sizes for all digit test on each category

  • Plots for each category if plot = TRUE or 'Save'

  • plot3Drgl::plotrgl() is a suggested function to turn 3D plots interactive

Examples

all_digits_test(digitdata, skip_first_digit=TRUE, break_out='col_name1', category='col_name2')
all_digits_test(digitdata, digit_places=-1, omit_05=c(0,5), break_out='col_name', distribution='Uniform', plot='Save')
all_digits_test(digitdata, data_columns='c(col_name1, col_name2)', omit_05=0, digit_places=c(1,3,5), suppress_low_N=TRUE)

jlederluis/digitanalysis documentation built on Nov. 5, 2023, 11:46 a.m.