| skipped_corr | R Documentation |
Computes all pairwise skipped correlation coefficients for the numeric columns of a matrix or data frame using a high-performance 'C++' backend.
Skipped correlation detects bivariate outliers using a projection rule and then computes Pearson or Spearman correlation on the retained observations. It is designed for situations where marginally robust methods can still be distorted by unusual points in the joint data cloud.
skipped_corr(
data,
method = c("pearson", "spearman"),
na_method = c("error", "pairwise"),
ci = FALSE,
p_value = FALSE,
conf_level = 0.95,
n_threads = getOption("matrixCorr.threads", 1L),
return_masks = FALSE,
stand = TRUE,
outlier_rule = c("idealf", "mad"),
cutoff = sqrt(stats::qchisq(0.975, df = 2)),
n_boot = 2000L,
p_adjust = c("none", "hochberg", "ecp"),
fwe_level = 0.05,
n_mc = 1000L,
seed = NULL,
output = c("matrix", "sparse", "edge_list"),
threshold = 0,
diag = TRUE
)
skipped_corr_masks(x, var1 = NULL, var2 = NULL)
## S3 method for class 'skipped_corr'
print(
x,
digits = 4,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
ci_digits = 4,
show_ci = NULL,
show_p = c("auto", "yes", "no"),
...
)
## S3 method for class 'skipped_corr'
plot(
x,
title = "Skipped correlation heatmap",
low_color = "indianred1",
high_color = "steelblue1",
mid_color = "white",
value_text_size = 4,
ci_text_size = 3,
show_value = TRUE,
...
)
## S3 method for class 'skipped_corr'
summary(
object,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
## S3 method for class 'summary.skipped_corr'
print(
x,
digits = NULL,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
data |
A numeric matrix or a data frame with at least two numeric columns. All non-numeric columns will be excluded. |
method |
Correlation computed after removing projected outliers. One of
|
na_method |
One of |
ci |
Logical; if |
p_value |
Logical; if |
conf_level |
Confidence level used when |
n_threads |
Integer |
return_masks |
Logical; if |
stand |
Logical; if |
outlier_rule |
One of |
cutoff |
Positive numeric constant multiplying the projected spread in
the outlier rule
|
n_boot |
Integer |
p_adjust |
One of |
fwe_level |
Familywise-error level used when
|
n_mc |
Integer |
seed |
Optional positive integer used to seed the bootstrap resampling
when |
output |
Output representation for the computed estimates.
|
threshold |
Non-negative absolute-value filter for non-matrix outputs:
keep entries with |
diag |
Logical; whether to include diagonal entries in
|
x |
An object of class |
var1, var2 |
Optional column names or 1-based column indices used by
|
digits |
Integer; number of digits to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
ci_digits |
Integer; digits for skipped-correlation confidence limits. |
show_ci |
One of |
show_p |
One of |
... |
Additional arguments passed to the underlying print or plot helper. |
title |
Character; plot title. |
low_color, high_color, mid_color |
Colors used in the heatmap. |
value_text_size |
Numeric text size for overlaid cell values. |
ci_text_size |
Text size for confidence intervals in the heatmap. |
show_value |
Logical; if |
object |
An object of class |
Let X \in \mathbb{R}^{n \times p} be a numeric matrix with rows as
observations and columns as variables. For a given pair of columns
(x, y), write the observed bivariate points as
u_i = (x_i, y_i)^\top, i=1,\ldots,n. If stand = TRUE,
each margin is first centred by its median and divided by a robust scale
estimate before outlier detection; otherwise the original pair is used. The
robust scale is the MAD when positive, with fallback to
\mathrm{IQR}/1.34898 and then the usual sample standard deviation if
needed. Let \tilde u_i denote the resulting points and let c be
the componentwise median center of the detection cloud.
For each observation i, define the direction vector
b_i = \tilde u_i - c. When \|b_i\| > 0, all observations are
projected onto the line through c in direction b_i. The
projected distances are
d_{ij} \;=\; \frac{|(\tilde u_j - c)^\top b_i|}{\|b_i\|},
\qquad j=1,\ldots,n.
For each direction i, observation j is flagged as an outlier if
d_{ij} \;>\; \mathrm{med}(d_{i\cdot}) + g\, s(d_{i\cdot}),
\qquad g = \code{cutoff},
where s(\cdot) is either the ideal-fourths interquartile width
(outlier_rule = "idealf") or the median absolute deviation
(outlier_rule = "mad"). An observation is removed if it is flagged
for at least one projection direction. The skipped correlation is then the
ordinary Pearson or Spearman correlation computed from the retained
observations:
r_{\mathrm{skip}}(x,y) \;=\;
\mathrm{cor}\!\left(x_{\mathcal{K}}, y_{\mathcal{K}}\right),
where \mathcal{K} is the index set of observations not flagged as
outliers.
Unlike marginally robust methods such as pbcor(), wincor(),
or bicor(), skipped correlation is explicitly pairwise because
outlier detection depends on the joint geometry of each variable pair. As a
result, the reported matrix need not be positive semidefinite, even with
complete data.
Computational notes. In the complete-data path, each column pair
requires a full bivariate projection search, so the dominant cost is higher
than for marginal robust methods. The implementation evaluates pairs in
'C++'; where available, pairs are processed with 'OpenMP' parallelism. With
na_method = "pairwise", each pair is recomputed on its overlap of
non-missing rows.
Bootstrap inference. When ci = TRUE or p_value = TRUE,
the implementation uses the percentile-bootstrap strategy studied by Wilcox
(2015). Each bootstrap replicate resamples whole observation pairs with
replacement, reruns the skipped-correlation outlier detection on the
resampled data, and recomputes the skipped correlation on the retained
observations. This corresponds to Wilcox's B2 method and avoids the
statistically unsatisfactory shortcut of removing outliers only once before
bootstrapping. Bootstrap inference currently requires complete data
(na_method = "error"). When p_adjust = "hochberg", the
bootstrap p-values are processed with Hochberg's step-up procedure (method H
in Wilcox, Rousselet, and Pernet, 2018). When p_adjust = "ecp", the
package follows their ECP method and simulates n_mc null data sets
from a p-variate normal distribution with identity covariance,
recomputes the pairwise bootstrap p-values for each null data set, stores the
minimum p-value from each run, and estimates the fwe_level quantile of
that null distribution using the Harrell-Davis estimator. Hypotheses are then
rejected when their observed bootstrap p-values are less than or equal to the
estimated critical p-value. The calibrated H1 procedure from Wilcox,
Rousselet, and Pernet (2018) is not currently implemented.
A symmetric correlation matrix with class skipped_corr and
attributes method = "skipped_correlation", description, and
package = "matrixCorr". When return_masks = TRUE, the matrix
also carries a skipped_masks attribute containing compact pairwise
skipped-row indices. The diagnostics attribute stores per-pair
complete-case counts and skipped-row counts/proportions. When
ci = TRUE or p_value = TRUE, bootstrap inference matrices are
attached via attributes.
Thiago de Paula Oliveira
Wilcox, R. R. (2004). Inferences based on a skipped correlation coefficient. Journal of Applied Statistics, 31(2), 131-143. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/0266476032000148821")}
Wilcox, R. R. (2015). Inferences about the skipped correlation coefficient: Dealing with heteroscedasticity and non-normality. Journal of Modern Applied Statistical Methods, 14(1), 172-188. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.22237/jmasm/1430453580")}
Wilcox, R. R., Rousselet, G. A., & Pernet, C. R. (2018). Improved methods for making inferences about multiple skipped correlations. Journal of Statistical Computation and Simulation, 88(16), 3116-3131. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/00949655.2018.1501051")}
pbcor(), wincor(), bicor()
set.seed(12)
X <- matrix(rnorm(160 * 4), ncol = 4)
X[1, 1] <- 9
X[1, 2] <- -8
R <- skipped_corr(X, method = "pearson")
print(R, digits = 2)
summary(R)
plot(R)
Rm <- skipped_corr(X, method = "pearson", return_masks = TRUE)
skipped_corr_masks(Rm, 1, 2)
# Example 1:
Xm <- as.matrix(datasets::mtcars[, c("mpg", "disp", "hp", "wt")])
Rm2 <- skipped_corr(Xm, method = "spearman")
print(Rm2, digits = 2)
# Example 2:
Ri <- skipped_corr(Xm, method = "pearson", ci = TRUE, n_boot = 40, seed = 1)
Ri$ci
# Interactive viewing (requires shiny)
if (interactive() && requireNamespace("shiny", quietly = TRUE)) {
view_corr_shiny(R)
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.