View source: R/dig_correlations.R
dig_correlations | R Documentation |
Conditional correlations are patterns that identify strong relationships between pairs of numeric variables under specific conditions.
xvar ~ yvar | C
xvar
and yvar
highly correlates in data that satisfy the condition
C
.
study_time ~ test_score | hard_exam
For hard exams, the amount of study time is highly correlated with
the obtained exam's test score.
The function computes correlations between all combinations of xvars
and
yvars
columns of x
in multiple sub-data corresponding to conditions
generated from condition
columns.
dig_correlations(
x,
condition = where(is.logical),
xvars = where(is.numeric),
yvars = where(is.numeric),
disjoint = var_names(colnames(x)),
method = "pearson",
alternative = "two.sided",
exact = NULL,
min_length = 0L,
max_length = Inf,
min_support = 0,
max_support = 1,
max_results = Inf,
verbose = FALSE,
threads = 1
)
x |
a matrix or data frame with data to search in. |
condition |
a tidyselect expression (see tidyselect syntax) specifying the columns to use as condition predicates |
xvars |
a tidyselect expression (see tidyselect syntax) specifying the columns to use for computation of correlations |
yvars |
a tidyselect expression (see tidyselect syntax) specifying the columns to use for computation of correlations |
disjoint |
an atomic vector of size equal to the number of columns of |
method |
a character string indicating which correlation coefficient is
to be used for the test. One of |
alternative |
indicates the alternative hypothesis and must be one of
|
exact |
a logical indicating whether an exact p-value should be computed.
Used for Kendall's tau and Spearman's rho. See |
min_length |
the minimum size (the minimum number of predicates) of the condition to be generated (must be greater or equal to 0). If 0, the empty condition is generated in the first place. |
max_length |
The maximum size (the maximum number of predicates) of the condition to be generated. If equal to Inf, the maximum length of conditions is limited only by the number of available predicates. |
min_support |
the minimum support of a condition to trigger the callback
function for it. The support of the condition is the relative frequency
of the condition in the dataset |
max_support |
the maximum support of a condition to trigger the callback
function for it. See argument |
max_results |
the maximum number of generated conditions to execute the
callback function on. If the number of found conditions exceeds
|
verbose |
a logical scalar indicating whether to print progress messages. |
threads |
the number of threads to use for parallel computation. |
A tibble with found patterns.
Michal Burda
dig()
, stats::cor.test()
# convert iris$Species into dummy logical variables
d <- partition(iris, Species)
# find conditional correlations between all pairs of numeric variables
dig_correlations(d,
condition = where(is.logical),
xvars = Sepal.Length:Petal.Width,
yvars = Sepal.Length:Petal.Width)
# With `condition = NULL`, dig_correlations() computes correlations between
# all pairs of numeric variables on the whole dataset only, which is an
# alternative way of computing the correlation matrix
dig_correlations(iris,
condition = NULL,
xvars = Sepal.Length:Petal.Width,
yvars = Sepal.Length:Petal.Width)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.