fit_metrics: Compute Fit Metrics for Binary Classification
In rettopnivek/binclass: Personalized functions for binary classification.

Description Usage Arguments Details Value Examples

Given output following model fitting (e.g., logistic regression), computes an assortment of fit metrics for the training and test subsets of the data.

fit_metrics(fit, dat, algorithm = "glm", offset_info = NULL,
  lambda_val = NULL, cutoff = NULL)

is.fit_metrics(x)

## S3 method for class 'fit_metrics'
subset(x, train = F, metric = "AUC")

## S3 method for class 'fit_metrics'
print(x, digits = 2)

## S3 method for class 'fit_metrics'
residuals(x, train = F)

`fit`	Model fit output (e.g., output from `glm`).
`dat`	An R object of class 'train_test'.
`algorithm`	The type of fitting algorithm. Options include `glm` and `glmnet`.
`offset_info`	An optional named list with a vector of offset values associated with each row of the matrix of predictrs (for 'train' and 'test', respectively) used when fitting data with `glm`.
`lambda_val`	An optional value indicating the best-fitting penalty term found when fitting data using `glmnet`.
`cutoff`	An optional integer coding how the probabilities returned from the `predict` function with `glm` output should be transformed into binary values, where... 0 = cut-off of 0.5; 1 = cut-off set to mean proportion in data; 0 < cutoff < 1 = cutoff set to specified value; 2 = binary values generated via a binomial distribution.

The function computes several metrics:

TPR - the true positive rate, the ratio of hits against the number of positive trials;
FPR - the false positive rate, the ratio of false alarms against the number of negative trials;
AUC - the area under the curve, estimated by numerical integration after tracing out the curve by computing the FPR versus TPR associated with each predicted probability;
d_prime - a measure of discrimabiility;
criterion - a measure of bias, where positive values denote greater bias against selecting a positive trial;
CE - The mean cross-entropy;
R - Pearson's R computed from the confusion matrix;
Accuracy - Predictive accuracy, the proportion of cases the model predicted correctly;
CM - The confusion matrix, the frequencies of predicted positive/negative instances against the observed frequencies;
AUC_curve - a data frame with the cut-offs based on the unique predicted probabilities and the associated true and false positive rates. Allows plotting of the AUC curve;
theta - the predicted probability of a positive outcome for each observation in the subset of data;
residuals - the difference between the observed outcome and the predicted probability;

The subset method allows a specified metric to be extracted for either the training (train == TRUE) or test subsets. The residuals method extracts the residuals.

An object of class 'fit_metrics', a list of lists, each providing the set of metrics over the training and test data, respectively.

# Simulate data
sim = bc_simulate( 300, 4, 2 )
# Create 'train_test' object
index = cv_index( 3, 300 )
dat = train_test( 3, index, sim$y, sim$X )
# Extract training data as data frame
train = as.data.frame( dat, train = T )
# Fit data
fit = glm( y ~ P1 + P2 + P3 + P4, family = 'binomial', data = train )
# Compute metrics
fm = fit_metrics( fit, dat )
fm