x2y | R Documentation |
The relative reduction in error when we go from a baseline model
(average for continuous and most frequent for categorical features) to
a predictive model, can measure the strength of the relationship between
two features. In other words, x2y
measures the ability of x
to predict y
. We use CART (Classification And Regression Trees) models
to be able to 1) compare numerical and non-numerical features, 2) detect
non-linear relationships, and 3) because they are easy/quick to train.
x2y(
df,
target = NULL,
symmetric = FALSE,
target_x = FALSE,
target_y = FALSE,
plot = FALSE,
top = 20,
quiet = "auto",
ohse = FALSE,
corr = FALSE,
...
)
x2y_metric(x, y, confidence = FALSE, bootstraps = 20, max_cat = 20)
## S3 method for class 'x2y_preds'
plot(x, corr = FALSE, ...)
## S3 method for class 'x2y'
plot(x, type = 1, ...)
x2y_preds(x, y, max_cat = 10)
df |
data.frame. Note that variables with no variance will be ignored. |
target |
Character vector. If you are only interested in the |
symmetric |
Boolean. |
target_x , target_y |
Boolean. Force target features to be part of
|
plot |
Boolean. Return a plot? If not, only a data.frame with calculated results will be returned. |
top |
Integer. Show/plot only top N predictive cross-features. Set
to |
quiet |
Boolean. Keep quiet? If not, show progress bar. |
ohse |
Boolean. Use |
corr |
Boolean. Add correlation and pvalue data to compare with? For
more custom studies, use |
... |
Additional parameters passed to |
x , y |
Vectors. Categorical or numerical vectors of same length. |
confidence |
Boolean. Calculate 95% confidence intervals estimated
with N |
bootstraps |
Integer. If |
max_cat |
Integer. Maximum number of unique |
type |
Integer. Plot type: |
This x2y
metric is based on Rama Ramakrishnan's
post: An Alternative to the Correlation
Coefficient That Works For Numeric and Categorical Variables. This analysis
complements our lares::corr_cross()
output.
Depending on plot
input, a plot or a data.frame with x2y results.
data(dft) # Titanic dataset
x2y_results <- x2y(dft, quiet = TRUE, max_cat = 10, top = NULL)
head(x2y_results, 10)
plot(x2y_results, type = 2)
# Confidence intervals with 10 bootstrap iterations
x2y(dft,
target = c("Survived", "Age"),
confidence = TRUE, bootstraps = 10, top = 8
)
# Compare with mean absolute correlations
x2y(dft, "Fare", corr = TRUE, top = 6, target_x = TRUE)
# Plot (symmetric) results
symm <- x2y(dft, target = "Survived", symmetric = TRUE)
plot(symm, type = 1)
# Symmetry: x2y vs y2x
on.exit(set.seed(42))
x <- seq(-1, 1, 0.01)
y <- sqrt(1 - x^2) + rnorm(length(x), mean = 0, sd = 0.05)
# Knowing x reduces the uncertainty about the value of y a lot more than
# knowing y reduces the uncertainty about the value of x. Note correlation.
plot(x2y_preds(x, y), corr = TRUE)
plot(x2y_preds(y, x), corr = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.