TidyPkuData speeds up base Europace loan data analysis tasks and makes it easy for everyone.
The functions provided in this package aiming to prepare and set up your dataset straightaway for ad-hoc analysis activieties after data import.
# install.packages("devtools")
devtools::install_github("europace-privatkredit/tidyPkuData")
tidyPkuData is a grammar of data manipulation with emphasis on internal Europace loans data, providing a consistent set of verbs that help you solve the most common data cleaning tasks:
importData()
imports a base dataset from AWS Redshift which satisfies most of the ad-hoc analysis task given at this point of time.tidyDate()
adds time-specific vectors to the dataset, such as year, month, week, weekday etc.tidyStatus()
cleans up the status information for convenience purpose.tidyPa()
cleans up product provider information for convenience purpose and let you decide whether to aggregate regional banks or not.tidyVo()
cleans up brokerage firm information for convenience purpose and aggregate sub-brokerage id's to a total picture.tidyFrontend()
cleans up frontend information for convenience purpose.tidyProcessingTime()
calculates the processing time for each application given the current status.tidyTerm()
re-calculates the given annual term structure to monthly terms.
_ tidyPurpose()
cleans up purpose information for convenience purpose.tidyProductType()
ensures that product types per product provider are bi-uniquely differentiated from each other.tidyEuropaceCent()
re-calculates "Europace-Gebühr" by bearing in mind product type exceptions which are not considered in the raw dataset.tidyGender()
derives the gender of each applicant from the raw data.tidyHouseholdIncome()
calculates the household income for each application.tidyNumberApplicants()
calculates the number of applicants per application.tidyMainApplicant()
adds the "Europace" main applicant logic to your dataset.tidyProfession()
cleans up profession type information for convenience purpose.If you are new to R and data in general, do not worry internal workshops will be provided upon request.
After Establishing redshift connection:
library(tidyPku)
dataset <- importData(startDate = '2018-06-01', endDate = '2018-06-30') %>%
tidyDataType() %>%
tidyDate(col = 'angenommenamdatum') %>%
tidyPurpose() %>%
tidyStatus() %>%
tidyPa(agg = TRUE) %>%
tidyVo() %>%
tidyTerm() %>%
tidyProductType() %>%
tidyEuropaceCent() %>%
tidyProcessingTime() %>%
tidyFrontend()
Your data set is ready for analysis purpose.
The remaining tidy functions included in the package generate customer-related data fields.
If you encounter a clear bug, please file a minimal reproducible example on github.
Please note that this project is a draft and under development. Further functionality is going to be added soon. Usage is restricted for internal purpose only.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.