high_low_test: Performs high to low digit tests vs probability of high to...
In jlederluis/digitanalysis: Digit Analysis

high_low_test

R Documentation

Performs high to low digit tests vs probability of high to low digits by Benford's Law via chi square test (default) or binomial test

Description

Performs high to low digit tests vs probability of high to low digits by Benford's Law via chi square test (default) or binomial test

Usage

high_low_test(
  digitdata,
  data_columns = "all",
  high = c(6, 7, 8, 9),
  omit_05 = NA,
  test_type = "chisq",
  distribution = "Benford",
  contingency_table = NA,
  skip_first_digit = FALSE,
  skip_last_digit = FALSE,
  break_out = NA,
  break_out_grouping = NA,
  category = NA,
  category_grouping = NA,
  plot = TRUE,
  remove_all_category_visualize = FALSE
)

Arguments

`digitdata`	A object of class `DigitAnalysis`.
`data_columns`	The names of numeric columns of data to be analyzed. Default can be 'all', where using all data columns in `numbers` df in `digitdata`; an array of column names, as characters; a single column name, as character.
`high`	An numeric array of digits or a single number that will be classified as high digits. Defaulted to c(6,7,8,9).
`omit_05`	Whether to omit 0 or both 0 and 5. If omit both 0 and 5, pass in c(0,5) or c(5,0); if omit only 0 pass in 0 or c(0); if omit neither, pass in NA. Default to NA.
`test_type`	Specifies whether to perform a binomial test on high vs low digit frequency weighted averaged across digit places with "binom", or a chi square test on high vs. low by each digit place with "chisq". Defaulted to "chisq".
`distribution`	'Benford' or 'Uniform'. Case insensitive. Specifies the distribution the chi square test is testing against. Default to 'Benford'.
`contingency_table`	The user-input probability table of arbitrary distribution. Overwrites `distribution` if not NA. Must be a dataframe of the form as `benford_table`. Defaulted to NA. Check out `load(file = "data/benford_table.RData")` to see the format of `benford_table`
`skip_first_digit`	TRUE or FALSE: If TRUE, skip first digit place before analysis. Default to FALSE.
`skip_last_digit`	TRUE or FALSE: If TRUE, skip last digit place before analysis, since we don't want tests to overlap. Default to FALSE. `skip_last_digit` should overwrite `digit_places` and `skip_first_digits`.
`break_out`	The data column (non-numeric!) to split up the dataset based on different categories in the column if specified as an character. The first division (usually x-axis) shown in plots. Default to NA.
`break_out_grouping`	A list of arrays, or defaulted to NA. Only effective if `break_out` is not NA. Each the names of the elements in the list is the break_out name Each array contains the values belonging to that break_out If it is remain as NA as default, while `break_out` is not NA, then `break_out_grouping` will default to every individual item in `break_out` will be in a separate group.
`category`	The column for splitting the data into sectors for separate analysis. The second division (usually variables) shown in plots.
`category_grouping`	A list of arrays, or defaulted to NA. Only effective if `category` is not NA. Each the names of the elements in the list is the category name Each array contains the values belonging to that category If it is remain as NA as default, while `category` is not NA, then `category_grouping` will default to every individual item in `category` will be in a separate group. e.g. `category_grouping = list(group_1=c(category_1, category_2, ...), group_2=c(category_10, ...), group_3=c(...))`
`plot`	TRUE or FALSE or 'Save': If TRUE, display the plots and return them. If 'Save', return the plots but suppress display. If FALSE, no plot is produced. Default to TRUE.
`remove_all_category_visualize`	TRUE or FALSE: If TRUE, remove visualization of 'All Category' dataset for plots

Value

A table of p-values for high low test on each category
A table of sample sizes for high low test on each category
Plots for each category if plot = TRUE or 'Save'

Examples

high_low_test(digitdata, high=c(5, 6,7,8,9))
high_low_test(digitdata, skip_first_digit=TRUE, break_out='col_name', test_type='binom')
high_low_test(digitdata, high=c(5,6,9), omit_05=0, skip_last_digit=TRUE, break_out='col_name', category='category_name')
high_low_test(digitdata, data_columns='c(col_name1, col_name2)', high=9, break_out='col_name', category='category_name', plot='Save')

jlederluis/digitanalysis documentation built on Nov. 5, 2023, 11:46 a.m.