sandbox/final_report_harry.md

Final Report for Expected Return

Contributor: Ziyu (Harry) He
Mentor: Justin Shea, Brian Peterson, Erol Biceroglu, Bryan Rodriguez
Organization: R project for statistical computing

Introduction

Expected Return is an R package that applies machine learning (ML) methods to quantitative finance. This package aims to aid practitioners and researchers in using machine learning for portfolio construction, backtesting, and risk analysis.

We decide to create this package and include core functionality that has appeared in academic literature but had no or limited functional equivalent in R. We found key inspiration from Machine Learning for Factor Investing (2020) and Advances in Financial Machine Learning (2018).

About the Package

R does not lack packages and functions that provide machine learning frameworks and pipelines for data analysis. Current approaches, however, often fall short of robust functions for analyzing financial data. As many theorists and practitioners have discussed at length, conventional machine learning procedures from feature engineering to cross-validation often fail when applied to time-series data. Ultimate we hope this package will provide a robust machine learning framework for quantitative finance. At the current stage, we aim to offer a viable pipeline with core functions for empirical applications

Contributions

Data Preprocessing

Backtesting

Contributions also include constructing a class object for faster and more efficient application, test functions to ensure the MLR3 wrapper functions properly with various ML algorithms, and evaluation and visualization functions. More detailed progress is recorded in a developer log

Future Steps

  1. Fully integrate various components of the pipeline
  2. Expand the package to include more core functionalities for hyper-parameter tuning, sample weights construction, model evaluation, feature selection, and feature engineering
  3. Explore more cross-validation approaches. Many existing approaches face various types of overfitting problems, including data leakage and the single scenario problem. The combinatorial purged cross-validation approach suggested by Lopez de Prado generate multiple synthetic paths but have limited applicable cases. We plan to examine the robustness of various established cross-validation methods and discuss new ways to avoid the various overfit pitfalls.


JustinMShea/ExpectedReturns documentation built on Aug. 26, 2024, 1:47 a.m.