| polyserial | R Documentation |
Computes polyserial correlations between continuous variables in data
and ordinal variables in y. Both pairwise vector mode and rectangular
matrix/data-frame mode are supported.
polyserial(data, y, na_method = c("error", "pairwise"), ci = FALSE, p_value = FALSE,
conf_level = 0.95, ...)
## S3 method for class 'polyserial_corr'
print(
x,
digits = 4,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
## S3 method for class 'polyserial_corr'
plot(
x,
title = "Polyserial correlation heatmap",
low_color = "indianred1",
high_color = "steelblue1",
mid_color = "white",
value_text_size = 4,
show_value = TRUE,
...
)
## S3 method for class 'polyserial_corr'
summary(
object,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
ci_digits = 3,
p_digits = 4,
show_ci = NULL,
...
)
## S3 method for class 'summary.polyserial_corr'
print(
x,
digits = NULL,
n = NULL,
topn = NULL,
max_vars = NULL,
width = NULL,
show_ci = NULL,
...
)
data |
A numeric vector, matrix, or data frame containing continuous variables. |
y |
An ordinal vector, matrix, or data frame containing ordinal variables. Supported columns are factors, ordered factors, logical values, or integer-like numerics. |
na_method |
Character scalar controlling missing-data handling.
|
ci |
Logical (default |
p_value |
Logical (default |
conf_level |
Confidence level used when |
... |
Additional arguments passed to |
x |
An object of class |
digits |
Integer; number of decimal places to print. |
n |
Optional row threshold for compact preview output. |
topn |
Optional number of leading/trailing rows to show when truncated. |
max_vars |
Optional maximum number of visible columns; |
width |
Optional display width; defaults to |
show_ci |
One of |
title |
Plot title. Default is |
low_color |
Color for the minimum correlation. |
high_color |
Color for the maximum correlation. |
mid_color |
Color for zero correlation. |
value_text_size |
Font size used in tile labels. |
show_value |
Logical; if |
object |
An object of class |
ci_digits |
Integer; digits for confidence limits in the pairwise summary. |
p_digits |
Integer; digits for p-values in the pairwise summary. |
The polyserial correlation assumes a latent bivariate normal model between a
continuous variable and an unobserved continuous propensity underlying an
ordinal variable. Let
(X, Z)^\top \sim N_2(0, \Sigma) with
\mathrm{corr}(X,Z)=\rho, and suppose the observed ordinal response
Y is formed by cut-points
-\infty = \beta_0 < \beta_1 < \cdots < \beta_K = \infty:
Y = k \iff \beta_{k-1} < Z \le \beta_k.
After standardising the observed continuous variable X, the thresholds
are estimated from the marginal proportions of Y. Conditional on an
observed x_i, the category probability is
\Pr(Y_i = k \mid X_i = x_i, \rho)
=
\Phi\!\left(\frac{\beta_k - \rho x_i}{\sqrt{1-\rho^2}}\right)
-
\Phi\!\left(\frac{\beta_{k-1} - \rho x_i}{\sqrt{1-\rho^2}}\right).
The returned estimate maximises the log-likelihood
\ell(\rho) = \sum_{i=1}^{n}\log \Pr(Y_i = y_i \mid X_i = x_i, \rho)
over \rho \in (-1,1) via a one-dimensional Brent search in C++.
Assumptions. The coefficient is appropriate when the ordinal variable is viewed as the discretised version of a latent normal variable that is jointly normal with the observed continuous variable. The optional p-values and confidence intervals adopt this latent-normal interpretation and use the same likelihood that defines the polyserial estimate. These inferential quantities are therefore model-based and should not be interpreted as distribution-free summaries.
Inference. When ci = TRUE or p_value = TRUE, the
function refits the pairwise polyserial model by maximum likelihood and
obtains the observed information matrix numerically in C++. The reported
confidence interval is a Wald interval
\hat\rho \pm z_{1-\alpha/2}\operatorname{SE}(\hat\rho), and the
reported p-value is from the large-sample Wald z-test for
H_0:\rho = 0. These inferential quantities are only computed when
explicitly requested.
In vector mode a single estimate is returned. In matrix/data-frame mode,
every numeric column of data is paired with every ordinal column of
y, producing a rectangular matrix of continuous-by-ordinal
polyserial correlations.
Computational complexity. If data has p_x continuous
columns and y has p_y ordinal columns, the matrix path computes
p_x p_y separate one-parameter likelihood optimisations.
If both data and y are vectors, a numeric scalar. Otherwise a
numeric matrix of class polyserial_corr with rows corresponding to
the continuous variables in data and columns to the ordinal variables
in y. Matrix outputs carry attributes method,
description, and package = "matrixCorr". When
p_value = TRUE, the returned object also carries an inference
attribute with elements estimate, statistic, parameter,
p_value, and n_obs. When ci = TRUE, it also carries a
ci attribute with elements est, lwr.ci,
upr.ci, conf.level, and ci.method, plus
attr(x, "conf.level"). Scalar outputs keep the same point estimate
and gain the same metadata only when inference is requested.
Thiago de Paula Oliveira
Olsson, U., Drasgow, F., & Dorans, N. J. (1982). The polyserial correlation coefficient. Psychometrika, 47(3), 337-347.
set.seed(125)
n <- 1000
Sigma <- matrix(c(
1.00, 0.30, 0.55, 0.20,
0.30, 1.00, 0.25, 0.50,
0.55, 0.25, 1.00, 0.40,
0.20, 0.50, 0.40, 1.00
), 4, 4, byrow = TRUE)
Z <- mnormt::rmnorm(n = n, mean = rep(0, 4), varcov = Sigma)
X <- data.frame(x1 = Z[, 1], x2 = Z[, 2])
Y <- data.frame(
y1 = ordered(cut(
Z[, 3],
breaks = c(-Inf, -0.5, 0.7, Inf),
labels = c("low", "mid", "high")
)),
y2 = ordered(cut(
Z[, 4],
breaks = c(-Inf, -1.0, 0.0, 1.0, Inf),
labels = c("1", "2", "3", "4")
))
)
ps <- polyserial(X, Y)
print(ps, digits = 3)
summary(ps)
plot(ps)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.