pai | R Documentation |
Given a set of predictions and observed counts, returns the PAI (predictive accuracy index), PEI (predictive efficiency index), and the RRI (recovery rate index)
pai(dat, count, pred, area, other = c())
dat |
data frame with the predictions, observed counts, and area sizes (can be a vector of ones) |
count |
character specifying the column name for the observed counts (e.g. the out of sample crime counts) |
pred |
character specifying the column name for the predicted counts (e.g. predictions based on a model) |
area |
character specifying the column name for the area sizes (could also be street segment distances, see Drawve & Wooditch, 2019) |
other |
vector of strings for any other column name you want to keep (e.g. an ID variable), defaults to empty |
Given predictions over an entire sample, this returns a dataframe with the sorted best PAI (sorted by density of predicted counts per area). PAI is defined as:
PAI = (ct/C)/(at/A)
Where the numerator is the percent of crimes in cumulative t areas, and the denominator is the percent of the area encompassed.
PEI is the observed PAI divided by the best possible PAI if you were a perfect oracle, so is scaled between 0 and 1.
RRI is predicted/observed
, so if you have very bad predictions can return Inf or undefined!
See Wheeler & Steenbeek (2019) for the definitions of the different metrics.
User note, PEI may behave funny with different sized areas.
A dataframe with the columns:
Order
, The order of the resulting rankings
Count
, the counts for the original crimes you specified
Pred
, the original predictions
Area
, the area for the units of analysis
Cum*
, the cumulative totals for Count/Pred/Area
PCum*
, the proportion cumulative totals, e.g. CumCount/sum(Count)
PAI
, the PAI stat
PEI
, the PEI stat
RRI
, the RRI stat (probably should analyze/graph the log(RRI)
)!
Plus any additional variables specified by other
at the end of the dataframe.
Drawve, G., & Wooditch, A. (2019). A research note on the methodological and theoretical considerations for assessing crime forecasting accuracy with the predictive accuracy index. Journal of Criminal Justice, 64, 101625.
Wheeler, A. P., & Steenbeek, W. (2021). Mapping the risk terrain for crime using machine learning. Journal of Quantitative Criminology, 37(2), 445-480.
pai_summary()
for a summary table of metrics for multiple pai tables given fixed N thresholds
# Making some very simple fake data crime_dat <- data.frame(id=1:6, obs=c(6,7,3,2,1,0), pred=c(8,4,4,2,1,0)) crime_dat$const <- 1 p1 <- pai(crime_dat,'obs','pred','const') print(p1) # Combining multiple predictions, making # A nice table crime_dat$rand <- sample(crime_dat$obs,nrow(crime_dat),FALSE) p2 <- pai(crime_dat,'obs','rand','const') pai_summary(list(p1,p2),c(1,3,5),c('one','two'))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.