PostgreSQL

This function prepares data stored in PostgreSQL or Oracle database for vintage analysis. According to Statsoft vintage analysis is: "Vintage Analysis. A vintage is a group of credit accounts that all originated within a specific time period, usually a year. Vintage analysis is used in credit scoring and refers to the process of monitoring groups of accounts and comparing performance across past groups. The comparisons take place at similar loan ages, allowing for the detection of deviation from past performance. Typically, a graphical representation is used for this purpose, such as one showing the relationship between months on the books and the percentage of delinquent accounts across multiple vintages." In this function concept of vintage is generalized in such way that any objects sharing same properties can form a vintage.

1
2
3

getVintageData(vintageUnitSql, performanceEventSql, timeGroup = "month",
  timeExpansion = "none", con, result = "data", sqlModifier = NULL,
  distanceFunctionSchema = NULL, verbose = FALSE, debug = FALSE)

vintageUnitSql

Valid SQL SELECT statement.
If sqlModifier is not used than result has to have the following structure (Note: optional elements are denoted as [...]):

Column	Description
`id`	Unique identificator of row. Is considered to be a primary key of `performanceEventSql`. Duplicates will result in incorrect results as join between vintage units and events will be multiplicated. Any (reasonable) data type possible.
`vintage_unit_date`	Has to be of type date. It is not necessary to round dates to months/quarters/years.
`[vintage_unit_weight]`	Optional weight. When no weight is defined than every row is considered to be one unit (every row in result represents one unit). Has to be of numeric type (integer, numeric, float).
`[Slicers]`	Any number of columns used to form vintages. There are no constraints on data types. Any name except `id, vintage_unit_date, vintage_unit_weight` can be used. Every unique combination of `Slicers` will be considered as one vintage. When there are no `Slicers` then only one vintage will be created.

performanceEventSql

Valid SQL SELECT statement. Result has to have the following structure (Note: optional elements are denoted as [...]):

Column	Description
`id`	Identificator of vintage unit to which event belongs. Has to be of same data type as column `id` in `vintageUnitSql`.
`event_date`	Date when event occured. Has to be of type date. It is not necessary to round dates to months/quarters/years.
`[event_weight]`	Optional weight. When no weight is defined than every row is considered to be one unit (every row in result represents one event). Has to be of numeric type (integer, numeric, float).

Note: it does not make sense to include any other columns as these are not used by the function. Currently, only single column key is supported.

timeGroup

Aggregation of vintage data. Defines how distance between vintage_unit_date and event_date is measured. Possible values are month, quarter, year. In PostgreSQL, distance calculation is performed by custom function named time_distance. Code to create this function is stored in exec/time_distance.sql. For Oracle, standard date functions are used.

timeExpansion

Defines how time expansion is performed. By default, vintages will be generated up to last existing point in events. E.g. when maximum distance in data is 10 than every vintage will have 10 observations. There are three other options, using now or date in yyyy-mm-dd format or local. now will be internally replaced by current date. local will use maximum event date available for every vintage. If any of these option is used than value of parameter will be used as last available point in data. Thus, number of points for every vintage will be expanded. Note that this might return unexpected results if this date is earlier than last point in events.

con

Connection to PostgreSQL or Oracle database. This is an connection object as produced by dbConnect(...) function.

result

Type of results to return. By default (data), vintage data are returned. The other option is to use sql - this will return SQL statement to get vintage data.

sqlModifier

This will constraint result of vintageUnitSql to selected columns and/or rows. Vector with 1 or 2 elements. In the first element, required columns are specified, the second can contain additional WHERE condition. If used than vintageUnitSql is wrapped into SELECT id, vintage_unit_date [, First Element] FROM (VintageUnitSQL) x [WHERE Second Element]. When only WHERE clause should be used and columns should remain unchanged than first element has to be asterisk ('*'). If first element is empty string or NA than only columns id and vintage_unit_date will be used from results of vintageUnitSql.

distanceFunctionSchema

Name of database schema where time_distance function is available. This is valid for PostgreSQL only.

verbose

Prints additional diagnostics messages when TRUE. Default is FALSE.

debug

Prints low level diagnostic messages. Usefull mainly for package developer. Default is FALSE

If successful, dataset with the following columns is returned:

Column	Description
`[Slicers]`	As defined in `VintageUnitSQL`.
`distance`	Distance between vintage unit date and event date measured in time interval (currently month, quarter or year).
`vintage_unit_weight`	sum of vintage unit weights for given vintage at given distance. Note that one vintage can have different values for different distances when no slicer is based on vintage unit date and with granularity equal to time grouping of events (see examples to understand this better). In case when no weight is defined that values are equal to values in column `vintage_unit_count`.
`vintage_unit_count`	number (of rows) for given vintage at distance. Using same logic as vintage unit weight.
`event_weight`	Sum of event weights for given vintage at distance. If no weight is defined, than 1 is used (equal to row count)
`event_weight_pct`	`event_weight / vintage_unit_weight`
`event_weight_csum`	running total of event weights for given vintage ordered by distance. If no slicer is based on vintage date and with granularity equal to time grouping of events, than running total cannot be reproduced as running total of event_weight! See examples to understand this better.
`event_weight_csum_pct`	`event_weight_csum / vintage_unit_weight`
`rn`	Column to numerically indicate order of last `Sclicer`. Useful when last `Slicer` is time-orderable, but data type is not number/date.