small_samptest | R Documentation |
Small sample test statistic for counts of N items in bins with particular probability.
small_samptest(d, p = rep(1/length(d), length(d)), type = "G", cdf = FALSE)
d |
vector of counts, e.g. c(0,2,1,3,1,4,0) for counts of crimes in days of the week |
p |
vector of baseline probabilities, defaults to equal probabilities in each bin |
type |
string specifying "G" for likelihhood ratio G stat (the default), "V" for Kuipers test (for circular data), "KS" for Komolgrov-Smirnov test, and "Chi" for Chi-square test |
cdf |
if |
This construct a null distribution for small sample statistics for N counts in M bins. Example use cases are to see if a repeat offender have a proclivity to commit crimes on a particular day of the week (see the referenced paper). It can also be used for Benford's analysis of leading/trailing digits for small samples. Referenced paper shows G test tends to have the most power, although with circular data may consider Kuiper's test.
A small_sampletest object with slots for:
CDF
, a dataframe that contains the exact probabilities and test statistic values for every possible permutation
probabilities
, the null probabilities you specified
data
, the observed counts you specified
test
, the type of test conducted (e.g. G, KS, Chi, etc.)
test_stat
, the test statistic for the observed data
p_value
, the p-value for the observed stat based on the exact null distribution
AggregateStatistics
, here is a reduced form aggregate table for the CDF/p-value calculation
If you wish to save the object, you may want to get rid of the CDF part, it can be quite large. It will have a total of choose(n+n-1,m-1)
total rows, where m is the number of bins and n is the total counts. So if you have 10 crimes in 7 days of the week, it will result in a dataframe with choose(7 + 10 - 1,7-1)
, which is 8008 rows.
Currently I keep the CDF part though to make it easier to calculate power for a particular test
Nigrini, M. J. (2012). Benford's Law: Applications for forensic accounting, auditing, and fraud detection. John Wiley & Sons.
Wheeler, A. P. (2016). Testing Serial Crime Events for Randomness in Day-of-Week Patterns with Small Samples. Journal of Investigative Psychology and Offender Profiling, 13(2), 148-165.
powalt()
for calculating power of a test under alternative
# Counts for different days of the week d <- c(3,1,1,0,0,1,1) #format N observations in M bins res <- small_samptest(d=d,type="G") print(res) # Example for Benfords analysis f <- 1:9 p_fd <- log10(1 + (1/f)) #first digit probabilities #check data from Nigrini page 84 checks <- c(1927.48,27902.31,86241.90,72117.46,81321.75,97473.96, 93249.11,89658.17,87776.89,92105.83,79949.16,87602.93, 96879.27,91806.47,84991.67,90831.83,93766.67,88338.72, 94639.49,83709.28,96412.21,88432.86,71552.16) # To make example run a bit faster c1 <- checks[1:10] #extracting the first digits fd <- substr(format(c1,trim=TRUE),1,1) tot <- table(factor(fd, levels=paste(f))) resG <- small_samptest(d=tot,p=p_fd,type="Chi") resG #Can reuse the cdf table if you have the same number of observations c2 <- checks[11:20] fd2 <- substr(format(c2,trim=TRUE),1,1) t2 <- table(factor(fd2, levels=paste(f))) resG2 <- small_samptest(d=t2,p=p_fd,type="Chi",cdf=resG$CDF)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.