sandbox/developer_log_harry.md

5/30/2024

5/31/2024

6/2/2024

6/4/2024

6/7/2024

6/8/2024

6/10/2024

6/12/2024

Conceptualization of pipeline

Preprocessing - How well-processed do we want our input data to be? - Handle missing values - Handle outliers (detect and winsorize) - Factor scaling - Feature selection - removing unhelpful predictors or overly correlated features - Creating useful labels - Splitting data and preparing data for ML algorithms

Calling Learners and Make Predictions - Integrate MLR3 algorithms - Make the process more transparent, we need to do more and better documentations - Create/integrate pipelines for fine-tuning hypermarameters

Model Evaluation - Create relevant benchmarks to compare model performance - Consider nonparametric outputs - Time series tests - Integrate TStest package - Present model interpretability - Make output (visualization) of model decisions/parameters - Allow users to evaluate whether these outputs make economic senses - this is probably most easy to see for tree-based models but can be generalized

Short to Medium Term Plan

Day 1-2 - Create a minimum-viable pipeline - Create a test dataset - Define one regression and one classification task - Define one benchmark for each task for performance comparison - Create a few (3-4) relative and absolute metrics for evaluation

Day 3-14 - Integrate selected MLR3 algorithms - Provide documentation for model specification, parameter inputs, and outputs - Budget time for 1-2 learners per day - For each learner/family of learner, I will identify relevant papers, see how these algorithms are implemented and make necessary adjustments - I will also read the documentations by MLR3 and the package it calls to better understand the parameters for the fine-tuning stage

Day 15-20 - Writing/integrating functions for preprocessing - Motivate preprocessing steps with existing literature - 1 type/family of preprocessing per day - If we decide to implement more features, we can extend this period as well

Day 21-25 - Model Interpretability - Visualize model outputs - If the model gives us weights, we can use bootstrap to get a confidence interval of weights distribution - If the model gives us something else, such as feature prominence and salience, we can also find ways to visualize them

Day 26-30 - Integrate TStest package and potential some more time-series tests - Apply additional tests such as scenario analysis and stress test to see how well the model perform under different scenarios - The idea is that the model may have limited applications - specific factors may have stronger predictive power in some states than others

Day 30-40 - Write functions for more learners that are not available in the MLR3 framework(If there are specific models we want to implement/test based on existing literature) - Test pipelines with various data/tasks to evaluate performance and identify areas of improvements

The end of this tentative plan put us at August 9.

Meeting with mentors

6/17/2024

6/18/2024

6/20/2024

6/21/2024

6/23/2024

6/24/2024

6/25/2024

6/27/2024



JustinMShea/ExpectedReturns documentation built on June 28, 2024, 5:37 p.m.