getVintageData: Get data in vintage analysis format

Description Usage Arguments Details

Description

This function prepares data stored in PostgreSQL or Oracle database for vintage analysis. According to Statsoft vintage analysis is: "Vintage Analysis. A vintage is a group of credit accounts that all originated within a specific time period, usually a year. Vintage analysis is used in credit scoring and refers to the process of monitoring groups of accounts and comparing performance across past groups. The comparisons take place at similar loan ages, allowing for the detection of deviation from past performance. Typically, a graphical representation is used for this purpose, such as one showing the relationship between months on the books and the percentage of delinquent accounts across multiple vintages." In this function concept of vintage is generalized in such way that any objects sharing same properties can form a vintage.

Usage

1
2
3
getVintageData(vintageUnitSql, performanceEventSql, timeGroup = "month",
  timeExpansion = "none", con, result = "data", sqlModifier = NULL,
  distanceFunctionSchema = NULL, verbose = FALSE, debug = FALSE)

Arguments

vintageUnitSql

Valid SQL SELECT statement.
If sqlModifier is not used than result has to have the following structure (Note: optional elements are denoted as [...]):

Column Description
id Unique identificator of row. Is considered to be a primary key of performanceEventSql. Duplicates will result in incorrect results as join between vintage units and events will be multiplicated. Any (reasonable) data type possible.
vintage_unit_date Has to be of type date. It is not necessary to round dates to months/quarters/years.
[vintage_unit_weight] Optional weight. When no weight is defined than every row is considered to be one unit (every row in result represents one unit). Has to be of numeric type (integer, numeric, float).
[Slicers] Any number of columns used to form vintages. There are no constraints on data types. Any name except id, vintage_unit_date, vintage_unit_weight can be used. Every unique combination of Slicers will be considered as one vintage. When there are no Slicers then only one vintage will be created.
performanceEventSql

Valid SQL SELECT statement. Result has to have the following structure (Note: optional elements are denoted as [...]):

Column Description
id Identificator of vintage unit to which event belongs. Has to be of same data type as column id in vintageUnitSql.
event_date Date when event occured. Has to be of type date. It is not necessary to round dates to months/quarters/years.
[event_weight] Optional weight. When no weight is defined than every row is considered to be one unit (every row in result represents one event). Has to be of numeric type (integer, numeric, float).

Note: it does not make sense to include any other columns as these are not used by the function. Currently, only single column key is supported.

timeGroup

Aggregation of vintage data. Defines how distance between vintage_unit_date and event_date is measured. Possible values are month, quarter, year. In PostgreSQL, distance calculation is performed by custom function named time_distance. Code to create this function is stored in exec/time_distance.sql. For Oracle, standard date functions are used.

timeExpansion

Defines how time expansion is performed. By default, vintages will be generated up to last existing point in events. E.g. when maximum distance in data is 10 than every vintage will have 10 observations. There are three other options, using now or date in yyyy-mm-dd format or local. now will be internally replaced by current date. local will use maximum event date available for every vintage. If any of these option is used than value of parameter will be used as last available point in data. Thus, number of points for every vintage will be expanded. Note that this might return unexpected results if this date is earlier than last point in events.

con

Connection to PostgreSQL or Oracle database. This is an connection object as produced by dbConnect(...) function.

result

Type of results to return. By default (data), vintage data are returned. The other option is to use sql - this will return SQL statement to get vintage data.

sqlModifier

This will constraint result of vintageUnitSql to selected columns and/or rows. Vector with 1 or 2 elements. In the first element, required columns are specified, the second can contain additional WHERE condition. If used than vintageUnitSql is wrapped into SELECT id, vintage_unit_date [, First Element] FROM (VintageUnitSQL) x [WHERE Second Element]. When only WHERE clause should be used and columns should remain unchanged than first element has to be asterisk ('*'). If first element is empty string or NA than only columns id and vintage_unit_date will be used from results of vintageUnitSql.

distanceFunctionSchema

Name of database schema where time_distance function is available. This is valid for PostgreSQL only.

verbose

Prints additional diagnostics messages when TRUE. Default is FALSE.

debug

Prints low level diagnostic messages. Usefull mainly for package developer. Default is FALSE

Details

If successful, dataset with the following columns is returned:

Column Description
[Slicers] As defined in VintageUnitSQL.
distance Distance between vintage unit date and event date measured in time interval (currently month, quarter or year).
vintage_unit_weight sum of vintage unit weights for given vintage at given distance. Note that one vintage can have different values for different distances when no slicer is based on vintage unit date and with granularity equal to time grouping of events (see examples to understand this better). In case when no weight is defined that values are equal to values in column vintage_unit_count.
vintage_unit_count number (of rows) for given vintage at distance. Using same logic as vintage unit weight.
event_weight Sum of event weights for given vintage at distance. If no weight is defined, than 1 is used (equal to row count)
event_weight_pct event_weight / vintage_unit_weight
event_weight_csum running total of event weights for given vintage ordered by distance. If no slicer is based on vintage date and with granularity equal to time grouping of events, than running total cannot be reproduced as running total of event_weight! See examples to understand this better.
event_weight_csum_pct event_weight_csum / vintage_unit_weight
rn Column to numerically indicate order of last Sclicer. Useful when last Slicer is time-orderable, but data type is not number/date.

This function is tested with PostgreSQL 9.1, but any version with window functions support should work.


tomasgreif/vintager documentation built on May 31, 2019, 5:16 p.m.