threshold_CV: Cross-validation function for tuning threshold in logistic...

Description Usage Arguments Value

View source: R/useful_functions.R

Description

Function that performs k-fold cross-validation in order to tune the threshold parameter in logistic regression. Fits a weighted logistic regression using randomized training/validation split, then find the the threshold parameter that maximises the approximate median significance (AMS).

Usage

1
threshold_CV(df, label, weights, theta_0, theta_1, k = 5, n = 200)

Arguments

df

Data-frame to perform cross-validation on.

label

Binary labels of data points in 'df' (0/1).

weights

Weights of data points in 'df'.

theta_0

Lower bound of threshold parameter.

theta_1

Upper bound of threshold parameter.

k

Number of cross-validation sets.

n

Number of values of threshold to check.

Value

'max_theta' is the threshold value that maximizes the AMS and 'max_AMS' is the maximum average AMS found across cross-validation sets.


hsansford1/higgsboson documentation built on Jan. 22, 2022, 4:34 a.m.