Preprocessing - How well-processed do we want our input data to be? - Handle missing values - Handle outliers (detect and winsorize) - Factor scaling - Feature selection - removing unhelpful predictors or overly correlated features - Creating useful labels - Splitting data and preparing data for ML algorithms
Calling Learners and Make Predictions - Integrate MLR3 algorithms - Make the process more transparent, we need to do more and better documentations - Create/integrate pipelines for fine-tuning hypermarameters
Model Evaluation - Create relevant benchmarks to compare model performance - Consider nonparametric outputs - Time series tests - Integrate TStest package - Present model interpretability - Make output (visualization) of model decisions/parameters - Allow users to evaluate whether these outputs make economic senses - this is probably most easy to see for tree-based models but can be generalized
Day 1-2 - Create a minimum-viable pipeline - Create a test dataset - Define one regression and one classification task - Define one benchmark for each task for performance comparison - Create a few (3-4) relative and absolute metrics for evaluation
Day 3-14 - Integrate selected MLR3 algorithms - Provide documentation for model specification, parameter inputs, and outputs - Budget time for 1-2 learners per day - For each learner/family of learner, I will identify relevant papers, see how these algorithms are implemented and make necessary adjustments - I will also read the documentations by MLR3 and the package it calls to better understand the parameters for the fine-tuning stage
Day 15-20 - Writing/integrating functions for preprocessing - Motivate preprocessing steps with existing literature - 1 type/family of preprocessing per day - If we decide to implement more features, we can extend this period as well
Day 21-25 - Model Interpretability - Visualize model outputs - If the model gives us weights, we can use bootstrap to get a confidence interval of weights distribution - If the model gives us something else, such as feature prominence and salience, we can also find ways to visualize them
Day 26-30 - Integrate TStest package and potential some more time-series tests - Apply additional tests such as scenario analysis and stress test to see how well the model perform under different scenarios - The idea is that the model may have limited applications - specific factors may have stronger predictive power in some states than others
Day 30-40 - Write functions for more learners that are not available in the MLR3 framework(If there are specific models we want to implement/test based on existing literature) - Test pipelines with various data/tasks to evaluate performance and identify areas of improvements
The end of this tentative plan put us at August 9.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.