audit: Sample dataset to illustrate Rattle functionality.

Description Format

Description

The audit dataset is an artificially constructed dataset that has some of the characteristics of a true financial audit dataset for modelling productive and non-productive audits of a person's financial statement. A productive audit is one which identifies errors or inaccuracies in the information provided by a client. A non-productive audit is usually an audit which found all supplied information to be in order.

The audit dataset is used to illustrate binary classification. The target variable is identified as TARGET\_Adjusted.

The dataset is quite small, consisting of just 2000 entities. Its primary purpose is to illustrate modelling in Rattle, so a minimally sized dataset is suitable.

The dataset itself is derived from publicly available data (which has nothing to do with audits).

Format

A data frame. In line with data mining terminology we refer to the rows of the data frame (or the observations) as entities. The columns are refered to as variables. The entities represent people in this case. We describe the variables here:

ID

This is a unique identifier for each person.

Age

The age.

Employment

The type of employment.

Education

The highest level of education.

Marital

Current marital status.

Occupation

The type of occupation.

Income

The amount of income declared.

Gender

The persons gender.

Deductions

Total amount of expenses that a person claims in their financial statement.

Hours

The average hours worked on a weekly basis.

IGNORE_Accounts

The main country in which the person has most of their money banked. Note that the variable name is prefixed with IGNORE. This is recognised by Rattle as the default role for this variable.

RISK_Adjustment

This variable records the monetary amount of any adjustment to the person's financial claims as a result of a productive audit. This variable, which should not be treated as an input variable, is thus a measure of the size of the risk associated with the person.

TARGET_Adjusted

The target variable for modelling (generally for classification modelling). This is a numeric field of class integer, but limited to 0 and 1, indicating non-productive and productive audits, respectively. Productive audits are those that result in an adjustment being made to a client's financial statement.


rattle documentation built on Sept. 5, 2017, 9:05 a.m.