evaluateRisk: Summarise the performance of a data mining model

Description Usage Arguments Author(s) References See Also Examples

View source: R/evaluate.R

Description

By taking predicted values, actual values, and measures of the risk associated with each case, generate a summary that groups the distinct predicted values, calculating the accumulative percentage Caseload, Recall, Risk, Precision, and Measure.

Usage

1
evaluateRisk(predicted, actual, risks)

Arguments

predicted

a numeric vector of probabilities (between 0 and 1) representing the probability of each entity being a 1.

actual

a numeric vector of classes (0 or 1).

risks

a numeric vector of risk (e.g., dollar amounts) associated with each entity that has a acutal of 1.

Author(s)

[email protected]

References

Package home page: https://rattle.togaware.com

See Also

plotRisk.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
## simulate the data that is typical in data mining

## we often have only a small number of positive known case
cases <- 1000
actual <- as.integer(rnorm(cases) > 1)
adjusted <- sum(actual)
nfa <- cases - adjusted

## risks might be dollar values associated adjusted cases
risks <- rep(0, cases)
risks[actual==1] <- round(abs(rnorm(adjusted, 10000, 5000)), 2)

## our models will generated a probability of a case being a 1
predicted <- rep(0.1, cases) 
predicted[actual==1] <- predicted[actual==1] + rnorm(adjusted, 0.3, 0.1)
predicted[actual==0] <- predicted[actual==0] + rnorm(nfa, 0.1, 0.08)
predicted <- signif(predicted)

## call upon evaluateRisk to generate performance summary
ev <- evaluateRisk(predicted, actual, risks)

## have a look at the first few and last few
head(ev)
tail(ev)

## the performance is usually presented as a Risk Chart
## under the CRAN MS/Windows this causes a problem, so don't run for now
## Not run: plotRisk(ev$Caseload, ev$Precision, ev$Recall, ev$Risk)

rattle documentation built on Aug. 17, 2018, 5:04 p.m.