| rsq | R Documentation |
Calculate the coefficient of determination using correlation. For the
traditional measure of R squared, see rsq_trad().
rsq(data, ...)
## S3 method for class 'data.frame'
rsq(data, truth, estimate, na_rm = TRUE, case_weights = NULL, ...)
rsq_vec(truth, estimate, na_rm = TRUE, case_weights = NULL, ...)
data |
A |
... |
Not currently used. |
truth |
The column identifier for the true results
(that is |
estimate |
The column identifier for the predicted
results (that is also |
na_rm |
A |
case_weights |
The optional column identifier for case weights. This
should be an unquoted column name that evaluates to a numeric column in
|
The two estimates for the
coefficient of determination, rsq() and rsq_trad(), differ by
their formula. The former guarantees a value on (0, 1) while the
latter can generate inaccurate values when the model is
non-informative (see the examples). Both are measures of
consistency/correlation and not of accuracy.
rsq() is simply the squared correlation between truth and estimate.
Because rsq() internally computes a correlation, if either truth or
estimate are constant it can result in a divide by zero error. In these
cases, a warning is thrown and NA is returned. This can occur when a model
predicts a single value for all samples. For example, a regularized model
that eliminates all predictors except for the intercept would do this.
Another example would be a CART model that contains no splits.
A tibble with columns .metric, .estimator,
and .estimate and 1 row of values.
For grouped data frames, the number of rows returned will be the same as the number of groups.
For rsq_vec(), a single numeric value (or NA).
Max Kuhn
Kvalseth. Cautionary note about R^2.
American Statistician (1985) vol. 39 (4) pp. 279-285.
Other numeric metrics:
ccc(),
huber_loss(),
huber_loss_pseudo(),
iic(),
mae(),
mape(),
mase(),
mpe(),
msd(),
poisson_log_loss(),
rmse(),
rpd(),
rpiq(),
rsq_trad(),
smape()
Other consistency metrics:
ccc(),
rpd(),
rpiq(),
rsq_trad()
# Supply truth and predictions as bare column names
rsq(solubility_test, solubility, prediction)
library(dplyr)
set.seed(1234)
size <- 100
times <- 10
# create 10 resamples
solubility_resampled <- bind_rows(
replicate(
n = times,
expr = sample_n(solubility_test, size, replace = TRUE),
simplify = FALSE
),
.id = "resample"
)
# Compute the metric by group
metric_results <- solubility_resampled %>%
group_by(resample) %>%
rsq(solubility, prediction)
metric_results
# Resampled mean estimate
metric_results %>%
summarise(avg_estimate = mean(.estimate))
# With uninformitive data, the traditional version of R^2 can return
# negative values.
set.seed(2291)
solubility_test$randomized <- sample(solubility_test$prediction)
rsq(solubility_test, solubility, randomized)
rsq_trad(solubility_test, solubility, randomized)
# A constant `truth` or `estimate` vector results in a warning from
# a divide by zero error in the correlation calculation.
# `NA` will be returned in these cases.
truth <- c(1, 2)
estimate <- c(1, 1)
rsq_vec(truth, estimate)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.