View source: R/dig_paired_baseline_contrasts.R
dig_paired_baseline_contrasts | R Documentation |
Paired baseline contrast patterns identify conditions under which there is a significant difference in some statistical feature between two paired numeric variables.
(xvar - yvar) != 0 | C
There is a statistically significant difference between paired variables
xvar
and yvar
under the condition C
.
(daily_ice_cream_income - daily_tea_income) > 0 | sunny
Under the condition of sunny weather, the paired test shows that
daily ice-cream income is significantly higher than the
daily tea income.
The paired baseline contrast is computed using a paired version of a statistical test,
which is specified by the method
argument. The function computes the paired
contrast between all pairs of variables, where the first variable is
specified by the xvars
argument and the second variable is specified by the
yvars
argument. Paired baseline contrasts are computed in sub-data corresponding
to conditions generated from the condition
columns. Function
dig_paired_baseline_contrasts()
supports crisp conditions only, i.e.,
the condition columns in x
must be logical.
dig_paired_baseline_contrasts(
x,
condition = where(is.logical),
xvars = where(is.numeric),
yvars = where(is.numeric),
disjoint = var_names(colnames(x)),
min_length = 0L,
max_length = Inf,
min_support = 0,
max_support = 1,
method = "t",
alternative = "two.sided",
h0 = 0,
conf_level = 0.95,
max_p_value = 1,
t_var_equal = FALSE,
wilcox_exact = FALSE,
wilcox_correct = TRUE,
wilcox_tol_root = 1e-04,
wilcox_digits_rank = Inf,
max_results = Inf,
verbose = FALSE,
threads = 1
)
x |
a matrix or data frame with data to search the patterns in. |
condition |
a tidyselect expression (see tidyselect syntax) specifying the columns to use as condition predicates |
xvars |
a tidyselect expression (see tidyselect syntax) specifying the columns to use for computation of contrasts |
yvars |
a tidyselect expression (see tidyselect syntax) specifying the columns to use for computation of contrasts |
disjoint |
an atomic vector of size equal to the number of columns of |
min_length |
the minimum size (the minimum number of predicates) of the condition to be generated (must be greater or equal to 0). If 0, the empty condition is generated in the first place. |
max_length |
The maximum size (the maximum number of predicates) of the condition to be generated. If equal to Inf, the maximum length of conditions is limited only by the number of available predicates. |
min_support |
the minimum support of a condition to trigger the callback
function for it. The support of the condition is the relative frequency
of the condition in the dataset |
max_support |
the maximum support of a condition to trigger the callback
function for it. See argument |
method |
a character string indicating which contrast to compute.
One of |
alternative |
indicates the alternative hypothesis and must be one of
|
h0 |
a numeric value specifying the null hypothesis for the test. For
the |
conf_level |
a numeric value specifying the level of the confidence interval. The default value is 0.95. |
max_p_value |
the maximum p-value of a test for the pattern to be considered
significant. If the p-value of the test is greater than |
t_var_equal |
(used for the |
wilcox_exact |
(used for the |
wilcox_correct |
(used for the |
wilcox_tol_root |
(used for the |
wilcox_digits_rank |
(used for the |
max_results |
the maximum number of generated conditions to execute the
callback function on. If the number of found conditions exceeds
|
verbose |
a logical scalar indicating whether to print progress messages. |
threads |
the number of threads to use for parallel computation. |
A tibble with found patterns in rows. The following columns are always present:
condition |
the condition of the pattern as a character string
in the form |
support |
the support of the condition, i.e., the relative
frequency of the condition in the dataset |
xvar |
the name of the first variable in the contrast. |
yvar |
the name of the second variable in the contrast. |
estimate |
the estimated difference of variable |
statistic |
the statistic of the selected test. |
p_value |
the p-value of the underlying test. |
n |
the number of rows in the sub-data corresponding to the condition. |
conf_int_lo |
the lower bound of the confidence interval of the estimate. |
conf_int_hi |
the upper bound of the confidence interval of the estimate. |
alternative |
a character string indicating the alternative
hypothesis. The value must be one of |
method |
a character string indicating the method used for the test. |
comment |
a character string with additional information about the test (mainly error messages on failure). |
For the "t"
method, the following additional columns are also
present (see also t.test()
):
df |
the degrees of freedom of the t test. |
stderr |
the standard error of the mean difference. |
Michal Burda
dig_baseline_contrasts()
, dig_complement_contrasts()
,
dig()
, dig_grid()
,
stats::t.test()
, stats::wilcox.test()
# Compute ratio of sepal and petal length and width for iris dataset
crispIris <- iris
crispIris$Sepal.Ratio <- iris$Sepal.Length / iris$Sepal.Width
crispIris$Petal.Ratio <- iris$Petal.Length / iris$Petal.Width
# Create predicates from the Species column
crispIris <- partition(crispIris, Species)
# Compute paired contrasts for ratios of sepal and petal length and width
dig_paired_baseline_contrasts(crispIris,
condition = where(is.logical),
xvars = Sepal.Ratio,
yvars = Petal.Ratio,
method = "t",
min_support = 0.1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.