Benny Salo 2019-02-15
The notebooks in this folder (/data_raw) contain analyses that produces the published data.
The folder is organized according to sequntial tasks in the analyzes. Broadly tasks 1-4 are setup, task 5 is training, and task 6 and 7 are collecting results.
The four training steps in task 5 are not all needed for each algoritm type. We perform all four steps only for the elastic net models. For elastic net these are as follows. 1. Run a quick training script with 2 repeats of 4 folds to get qualifying combinations of tuning parameter alpha and lambda. Combinations that have poorer logloss values than the best model in all 8 folds are disqualified from further analyses. This speeds up the later training steps by excluding combinations that are unlikely to produce the best trained models. 2. Run a training script with 25 repeats of 4 folds (100 folds in total) to find the tuning parameters with the best logLoss values. 3. Rerun the chosen model based on (2) on the 100 folds in (2) to extract calibration statistics. (The summary function that exctracts calibration statistics is slow and demands a lot of memory, thus we chose to run it only on the models with the tuning parameters that maximize discrimination.) 4. Rerun the chosen model based on (2) on 250 repeats of 4 folds to get more reliable values that allow calculating percentile confidence intervals for discrimination statistics. (Calibration statistics are not calculated here.)
For logistic regression only steps 3 and 4 are run. For random forest only steps 2 - 4.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.