padding_test: Performs padding test vs simulations of Benford conforming...
In jlederluis/digitanalysis: Digit Analysis

padding_test

R Documentation

Performs padding test vs simulations of Benford conforming datasets via percentile

Description

Performs padding test vs simulations of Benford conforming datasets via percentile

Usage

padding_test(
  digitdata,
  data_columns = "all",
  max_length = 8,
  num_digits = 5,
  N = 10000,
  simulate = TRUE,
  omit_05 = NA,
  break_out = NA,
  break_out_grouping = NA,
  category = NA,
  category_grouping = NA,
  distribution = "Benford",
  contingency_table = NA,
  suppress_first_division_plots = NA,
  plot = TRUE
)

Arguments

`digitdata`	A object of class `DigitAnalysis`.
`data_columns`	The names of numeric columns of data to be analyzed. Default can be 'all', where using all data columns in `numbers` df in `digitdata`; an array of column names, as characters; a single column name, as character.
`max_length`	The length of the longest numbers considered. Defaulted to 8.
`num_digits`	The total number of digits aligned from the right to be analyzed. Defaulted to 5, meaning analyzing digit place 1s to 10ks.
`N`	The number of Benford conforming datasets to simulate. 2400 seconds for N=10,000; data dimension = 4000 x 5 total digits.
`simulate`	TRUE or FALSE: If TRUE, will stimulate the datasets and generate p-value. If FALSE, only produces `diff_in_mean` and plots. Overwrites `N`.
`omit_05`	Whether to omit 0 or both 0 and 5. If omit both 0 and 5, pass in c(0,5) or c(5,0); if omit only 0 pass in 0 or c(0); if omit neither, pass in NA. Default to NA.
`break_out`	The data column (non-numeric!) to split up the dataset based on different categories in the column if specified as an character. The first division (usually x-axis) shown in plots. Default to NA.
`break_out_grouping`	A list of arrays, or defaulted to NA. Only effective if `break_out` is not NA. Each the names of the elements in the list is the break_out name Each array contains the values belonging to that break_out If it is remain as NA as default, while `break_out` is not NA, then `break_out_grouping` will default to every individual item in `break_out` will be in a separate group.
`category`	The column for splitting the data into sectors for separate analysis. The second division (usually variables) shown in plots.
`category_grouping`	A list of arrays, or defaulted to NA. Only effective if `category` is not NA. Each the names of the elements in the list is the category name Each array contains the values belonging to that category If it is remain as NA as default, while `category` is not NA, then `category_grouping` will default to every individual item in `category` will be in a separate group. e.g. `category_grouping = list(group_1=c(category_1, category_2, ...), group_2=c(category_10, ...), group_3=c(...))`
`distribution`	'Benford' or 'Uniform'. Case insensitive. Specifies the distribution the chi square test is testing against. Default to 'Benford'.
`contingency_table`	The user-input probability table of arbitrary distribution. Overwrites `distribution` if not NA. Must be a dataframe of the form as `benford_table`. Defaulted to NA. Check out `load(file = "data/benford_table.RData")` to see the format of `benford_table`
`suppress_first_division_plots`	TRUE or FALSE: If TRUE, suppress the display of all plots on first and second division. If TRUE, `suppress_second_division_plots` will also be set to TRUE.
`plot`	TRUE or FALSE or 'Save': If TRUE, display the plots and return them. If 'Save', return the plots but suppress display. If FALSE, no plot is produced. Default to TRUE.

Value

A list with 4 elements

A list of p-values from Monte Carlo Simulation on each category
A list of difference in mean between observed_mean and expected_mean on each category
A sample size value that corresponds to N if simulate = TRUE
Plots for each category if plot = TRUE or 'Save'

Examples

padding_test(digitdata, omit_05=c(0,5), simulate=FALSE)
padding_test(digitdata, data_columns=c('col_name1', 'col_name2'), break_out='col_name')
padding_test(digitdata, N=100, break_out='col_name', distribution='uniform', plot='Save')
padding_test(digitdata, max_length=10, num_digits=3, omit_05=0, break_out='col_name', category='category_name')

jlederluis/digitanalysis documentation built on Nov. 5, 2023, 11:46 a.m.