factory_model_performance_r: Evaluate the performance of a fitted pipeline

Description Usage Arguments Value Note References Examples

View source: R/factory_model_performance_r.R

Description

Performance metrics on the test set.

Usage

1
factory_model_performance_r(pipe, x_train, y_train, x_test, y_test, metric)

Arguments

x_train

Data frame. Training data (predictor).

y_train

Vector. Training data (response).

x_test

Data frame. Test data (predictor).

y_test

Vector. Test data (response).

metric

String. Scorer that was used in pipeline tuning ("accuracy_score", "balanced_accuracy_score", "matthews_corrcoef", "class_balance_accuracy_score")

Value

A list of length 5:

Note

Returned object tuning_results lists all (hyper)parameter values tried during pipeline fitting, along with performance metrics. It is generated from the Scikit-learn output that follows pipeline fitting. It is derived from attribute cv_results_ with some modifications. In R, cv_results_ can be accessed following fitting of a pipeline with pxtextmineR::factory_pipeline_r or by calling function pxtextmineR::factory_model_performance_r. Say that the fitted pipeline is assigned to an object called pipe, and that the pipeline performance is assigned to an object called pipe_performance. Then, cv_results_ can be accessed with pipe$cv_results_ or pipe_performance$cv_results_.

NOTE: After calculating performance metrics on the test set, pxtextmineR::factory_model_performance_r fits the pipeline on the whole dataset (train + test). Hence, do not be surprised that the pipeline's score() method will now return a dramatically improved score on the test set- the refitted pipeline has now "seen" the test dataset (see Examples). The re-fitted pipeline will perform much better on fresh data than the pipeline fitted on x_train and y_train only.

References

Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M. & Duchesnay E. (2011), Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12:2825–-2830.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
# Prepare training and test sets
data_splits <- pxtextmineR::factory_data_load_and_split_r(
  filename = pxtextmineR::text_data,
  target = "label",
  predictor = "feedback",
  test_size = 0.90) # Make a small training set for a faster run in this example

# Let's take a look at the returned list
str(data_splits)

# Fit the pipeline
pipe <- pxtextmineR::factory_pipeline_r(
  x = data_splits$x_train,
  y = data_splits$y_train,
  tknz = "spacy",
  ordinal = FALSE,
  metric = "accuracy_score",
  cv = 2, n_iter = 10, n_jobs = 1, verbose = 3,
  learners = c("SGDClassifier", "MultinomialNB")
)
# (SGDClassifier represents both logistic regression and linear SVM. This
# depends on the value of the "loss" hyperparameter, which can be "log" or
# "hinge". This is set internally in factory_pipeline_r).

# Assess model performance
pipe_performance <- pxtextmineR::factory_model_performance_r(
  pipe = pipe,
  x_train = data_splits$x_train,
  y_train = data_splits$y_train,
  x_test = data_splits$x_test,
  y_test = data_splits$y_test,
  metric = "accuracy_score")

names(pipe_performance)

# Let's compare pipeline performance for different tunings with a range of
# metrics averaging the cross-validation metrics for each fold.
pipe_performance$
  tuning_results %>%
  dplyr::select(learner, dplyr::contains("mean_test"))

# A glance at the (hyper)parameters and their tuned values
pipe_performance$
  tuning_results %>%
  dplyr::select(learner, dplyr::contains("param_")) %>%
  str()

# Accuracy per class
pipe_performance$accuracy_per_class

# Learner performance barplot
pipe_performance$p_compare_models_bar
# Remember that we tried three models: Logistic regression (SGDClassifier with
# "log" loss), linear SVM (SGDClassifier with "hinge" loss) and MultinomialNB.
# Do not be surprised if one of these models does not show on the plot.
# There are numerous values for the different (hyper)parameters (recall,
# most of which are set internally) and only `n_iter = 10` iterations in this
# example. As with `factory_pipeline` the choice of which (hyper)parameter
# values to try out is random, one or more classifiers may not be chosen.
# Increasing `n_iter` to a larger number would avoid this, at the expense of
# longer fitting times (but with a possibly more accurate pipeline).

# Predictions on test set
preds <- pipe_performance$pred
head(preds)

################################################################################
# NOTE!!! #
################################################################################
# After calculating performance metrics on the test set,
# pxtextmineR::factory_model_performance_r fits the pipeline on the WHOLE
# dataset (train + test). Hence, do not be surprised that the pipeline's
# score() method will now return a dramatically improved score on the test
# set- the refitted pipeline has now "seen" the test dataset.
pipe_performance$pipe$score(data_splits$x_test, data_splits$y_test)
pipe$score(data_splits$x_test, data_splits$y_test)

# We can confirm this score by having the re-fitted pipeline predict x_test
# again. The predictions will be better and the new accuracy score will be
# the inflated one.
preds_refitted <- pipe$predict(data_splits$x_test)

score_refitted <- data_splits$y_test %>%
  data.frame() %>%
  dplyr::rename(true = '.') %>%
  dplyr::mutate(
    pred = preds_refitted,
    check = true == preds_refitted,
    check = sum(check) / nrow(.)
  ) %>%
  dplyr::pull(check) %>%
  unique()

score_refitted

# Compare this to the ACTUAL performance on the test dataset
preds_actual <- pipe_performance$pred

score_actual <- data_splits$y_test %>%
  data.frame() %>%
  dplyr::rename(true = '.') %>%
  dplyr::mutate(
    pred = preds_actual,
    check = true == preds_actual,
    check = sum(check) / nrow(.)
  ) %>%
  dplyr::pull(check) %>%
  unique()

score_actual

score_refitted - score_actual

nhs-r-community/pxtextmineR documentation built on Dec. 22, 2021, 2:10 a.m.