Description Usage Arguments Value
This function returns a list of typical machine learning statistics for a binary classifier (predictor) based on data from an xts matrix by comparing the predictions vs. actuals. The matrix must contain the asset equity curve and a binary predictor class (1 = predict up market, 0 = predict down market). This is especially useful for analyzing the performance of market timers.
1 | predictor_stats(data, ec_col = "ec", timer_col = "GC")
|
data |
The xts matrix data containing the equity curve of the asset or portfolio, and the classifier's prediction column. e.g. market timer. |
ec_col |
The column name or column number of the equity curve under test. Default = "ec". |
timer_col |
The column name or column number of the prediction (or timer). Default is "GC". |
Returns a list of several objects. Each item is defined as follows:
$predictions:
An xts matrix containing the dates when predictions happen. Columns are the equity curve, the prediction (from the timer), the actual equity curve results (indicating whether the market actually went up over the ensuing cycle, and four columns indicating whether the prediction ended up being a TP, FP, TN or FN (see predictions_stats for details).
$N_predictions:
Number of predictions (transitions) observed, including the latest one where the actual is unknown.
$predictor_stats:
The predictor statistics expressed as a vector of length 4 as c(TP, FP, FN, TN), where TP = True Positives, FP = False Positives, FN = False Negatives and TN = True Negatives.
$confusion_matrix:
The transition statistics expressed as a standard confusion matrix.
$confusion_ext:
The extended confusion matrix, with an added row summing the total predicted positives and negatives, and an added column summing the total actual positives and negatives. The bottom right entry is the sum of all predictions, which must be one less than $N_predictions because this matrix includes only those predictions that have an associated actual value (thus, excludes the last prediction).
$total:
The total number of predictions analyzed. To be analyzed, a prediction must have an associated actual, so this excludes that latest prediction.
$actual_pos:
The number of actual positives, as defined by the equity curve showing a positive return over the period between two predictions.
$actual_neg:
The number of actual negatives, defined as the equity curve showing a negative return over the period between two predictions.
$accuracy:
Answers the question: "Overall, how often is the classifier correct?" Defined as follows:
accuracy = (TP + TN) / total
Note that accuracy is not an appropriate measure for unbalanced classes. Precision and recall should be used instead.
$misclass_rate:
Answers the question: "Overall, how often is it wrong?" The misclassification rate, aka "Error Rate", is defined as follows:
misclass_rate = 1 - accuracy = (FP + FN) / total
$TP_rate:
Answers the question: "When actual is positive, how often does it predict positive?" The True Positive rate, aka "Sensitivity" or "Recall", is defined as follows:
TP_rate = TP / actual positives
$FP_rate:
Answers the question: "When actual is negative, how often does it predict positive?" The False Positive rate is defined as follows:
FP_rate = FP / actual negatives
$TN_rate:
Answers the question: "When actual is negative, how often does it predict negative?" The True Negative rate, aka as specificity is defined as follows:
TN_rate = TN / actual negatives
$FN_rate:
Answers the question: "When actual is positive, how often does it predict negative?" The False Negative rate is defined as follows:
FN_rate = FN / actual positives
$specificity:
Answers the question: "When actual is negative, how often does it predict negative?" Specificity is equivalent to True Negative rate and is defined as:
specificity = TN / actual negatives = TN_rate
$precision:
Answers the question: "What is the fraction of predicted positives are actually true positives?" Stated differently, it is how often a positive prediction turns out to be correct. It is therefore a measure of confirmation. Precision is defined as:
precision = TP / predicted positives = TP / (TP + FP)
$recall:
Recall is the same as the True Positive rate. It is the fraction of the actual (true) positives that are detected by the classifier. It is therefore a measure of utility i.e. how much does the classifier find that is actually to be found. See TP_rate above.
$prevalence:
Answers the question: "How often does the positive condition happens in the data set?" Prevalence is defined as:
prevalence = actual positives / total
$pos_pred_value:
The Positive Predictive Value is similar to precision except that it takes prevalence into account. When the classes are balanced, PPV is identical to precision.
pos_pred_value = TP / (TP + FP)
CHECK THIS as this looks identical to precision!
$null_error_rate:
The
$F1_score:
The F1 score combines precision and recall into one figure. If either precision or recall is very small, then the F1 score will also be small. The F1 score is defined as:
F1_score = 2 * precision * recall / (precision + recall)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.