audit | R Documentation |
The audit dataset is an artificially constructed dataset that has some of the characteristics of a true financial audit dataset for modelling productive and non-productive audits of a person's financial statement. A productive audit is one which identifies errors or inaccuracies in the information provided by a client. A non-productive audit is usually an audit which found all supplied information to be in order.
The audit dataset is used to illustrate binary classification. The
target variable is identified as TARGET\_Adjusted
.
The dataset is quite small, consisting of just 2000 entities. Its primary purpose is to illustrate modelling in Rattle, so a minimally sized dataset is suitable.
The dataset itself is derived from publicly available data (which has nothing to do with audits).
A data frame. In line with data mining terminology we refer to the rows of the data frame (or the observations) as entities. The columns are refered to as variables. The entities represent people in this case. We describe the variables here:
ID
This is a unique identifier for each person.
Age
The age.
Employment
The type of employment.
Education
The highest level of education.
Marital
Current marital status.
Occupation
The type of occupation.
Income
The amount of income declared.
Gender
The persons gender.
Deductions
Total amount of expenses that a person claims in their financial statement.
Hours
The average hours worked on a weekly basis.
IGNORE_Accounts
The main country in which the person has most of their money banked. Note that the variable name is prefixed with IGNORE. This is recognised by Rattle as the default role for this variable.
RISK_Adjustment
This variable records the monetary amount of any adjustment to the person's financial claims as a result of a productive audit. This variable, which should not be treated as an input variable, is thus a measure of the size of the risk associated with the person.
TARGET_Adjusted
The target variable for modelling (generally for classification modelling). This is a numeric field of class integer, but limited to 0 and 1, indicating non-productive and productive audits, respectively. Productive audits are those that result in an adjustment being made to a client's financial statement.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.