polyserial: Polyserial Correlation Between Continuous and Ordinal...
In matrixCorr: Collection of Correlation and Association Estimators

polyserial

R Documentation

Polyserial Correlation Between Continuous and Ordinal Variables

Description

Computes polyserial correlations between continuous variables in data and ordinal variables in y. Both pairwise vector mode and rectangular matrix/data-frame mode are supported.

Usage

polyserial(data, y, na_method = c("error", "pairwise"), ci = FALSE, p_value = FALSE,
  conf_level = 0.95, ...)

## S3 method for class 'polyserial_corr'
print(
  x,
  digits = 4,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

## S3 method for class 'polyserial_corr'
plot(
  x,
  title = "Polyserial correlation heatmap",
  low_color = "indianred1",
  high_color = "steelblue1",
  mid_color = "white",
  value_text_size = 4,
  show_value = TRUE,
  ...
)

## S3 method for class 'polyserial_corr'
summary(
  object,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  ci_digits = 3,
  p_digits = 4,
  show_ci = NULL,
  ...
)

## S3 method for class 'summary.polyserial_corr'
print(
  x,
  digits = NULL,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

Arguments

`data`	A numeric vector, matrix, or data frame containing continuous variables.
`y`	An ordinal vector, matrix, or data frame containing ordinal variables. Supported columns are factors, ordered factors, logical values, or integer-like numerics.
`na_method`	Character scalar controlling missing-data handling. `"error"` rejects missing values. `"pairwise"` uses pairwise complete cases.
`ci`	Logical (default `FALSE`). If `TRUE`, attach model-based large-sample Wald confidence intervals derived from the observed information matrix of the latent-variable likelihood.
`p_value`	Logical (default `FALSE`). If `TRUE`, attach model-based large-sample Wald p-values and test statistics for each estimated latent correlation.
`conf_level`	Confidence level used when `ci = TRUE`. Default is `0.95`.
`...`	Additional arguments passed to `print()`.
`x`	An object of class `summary.polyserial_corr`.
`digits`	Integer; number of decimal places to print.
`n`	Optional row threshold for compact preview output.
`topn`	Optional number of leading/trailing rows to show when truncated.
`max_vars`	Optional maximum number of visible columns; `NULL` derives this from console width.
`width`	Optional display width; defaults to `getOption("width")`.
`show_ci`	One of `"yes"` or `"no"`.
`title`	Plot title. Default is `"Polyserial correlation heatmap"`.
`low_color`	Color for the minimum correlation.
`high_color`	Color for the maximum correlation.
`mid_color`	Color for zero correlation.
`value_text_size`	Font size used in tile labels.
`show_value`	Logical; if `TRUE` (default), overlay numeric values on the heatmap tiles.
`object`	An object of class `polyserial_corr`.
`ci_digits`	Integer; digits for confidence limits in the pairwise summary.
`p_digits`	Integer; digits for p-values in the pairwise summary.

Details

The polyserial correlation assumes a latent bivariate normal model between a continuous variable and an unobserved continuous propensity underlying an ordinal variable. Let (X, Z)^\top \sim N_2(0, \Sigma) with \mathrm{corr}(X,Z)=\rho, and suppose the observed ordinal response Y is formed by cut-points -\infty = \beta_0 < \beta_1 < \cdots < \beta_K = \infty:

Y = k \iff \beta_{k-1} < Z \le \beta_k.

After standardising the observed continuous variable X, the thresholds are estimated from the marginal proportions of Y. Conditional on an observed x_i, the category probability is

\Pr(Y_i = k \mid X_i = x_i, \rho) = \Phi\!\left(\frac{\beta_k - \rho x_i}{\sqrt{1-\rho^2}}\right) - \Phi\!\left(\frac{\beta_{k-1} - \rho x_i}{\sqrt{1-\rho^2}}\right).

The returned estimate maximises the log-likelihood

\ell(\rho) = \sum_{i=1}^{n}\log \Pr(Y_i = y_i \mid X_i = x_i, \rho)

over \rho \in (-1,1) via a one-dimensional Brent search in C++.

Assumptions. The coefficient is appropriate when the ordinal variable is viewed as the discretised version of a latent normal variable that is jointly normal with the observed continuous variable. The optional p-values and confidence intervals adopt this latent-normal interpretation and use the same likelihood that defines the polyserial estimate. These inferential quantities are therefore model-based and should not be interpreted as distribution-free summaries.

Inference. When ci = TRUE or p_value = TRUE, the function refits the pairwise polyserial model by maximum likelihood and obtains the observed information matrix numerically in C++. The reported confidence interval is a Wald interval \hat\rho \pm z_{1-\alpha/2}\operatorname{SE}(\hat\rho), and the reported p-value is from the large-sample Wald z-test for H_0:\rho = 0. These inferential quantities are only computed when explicitly requested.

In vector mode a single estimate is returned. In matrix/data-frame mode, every numeric column of data is paired with every ordinal column of y, producing a rectangular matrix of continuous-by-ordinal polyserial correlations.

Computational complexity. If data has p_x continuous columns and y has p_y ordinal columns, the matrix path computes p_x p_y separate one-parameter likelihood optimisations.

Value

If both data and y are vectors, a numeric scalar. Otherwise a numeric matrix of class polyserial_corr with rows corresponding to the continuous variables in data and columns to the ordinal variables in y. Matrix outputs carry attributes method, description, and package = "matrixCorr". When p_value = TRUE, the returned object also carries an inference attribute with elements estimate, statistic, parameter, p_value, and n_obs. When ci = TRUE, it also carries a ci attribute with elements est, lwr.ci, upr.ci, conf.level, and ci.method, plus attr(x, "conf.level"). Scalar outputs keep the same point estimate and gain the same metadata only when inference is requested.

Author(s)

Thiago de Paula Oliveira

References

Olsson, U., Drasgow, F., & Dorans, N. J. (1982). The polyserial correlation coefficient. Psychometrika, 47(3), 337-347.

Examples


set.seed(125)
n <- 1000
Sigma <- matrix(c(
  1.00, 0.30, 0.55, 0.20,
  0.30, 1.00, 0.25, 0.50,
  0.55, 0.25, 1.00, 0.40,
  0.20, 0.50, 0.40, 1.00
), 4, 4, byrow = TRUE)

Z <- mnormt::rmnorm(n = n, mean = rep(0, 4), varcov = Sigma)
X <- data.frame(x1 = Z[, 1], x2 = Z[, 2])
Y <- data.frame(
  y1 = ordered(cut(
    Z[, 3],
    breaks = c(-Inf, -0.5, 0.7, Inf),
    labels = c("low", "mid", "high")
  )),
  y2 = ordered(cut(
    Z[, 4],
    breaks = c(-Inf, -1.0, 0.0, 1.0, Inf),
    labels = c("1", "2", "3", "4")
  ))
)

ps <- polyserial(X, Y)
print(ps, digits = 3)
summary(ps)
plot(ps)

matrixCorr documentation built on April 18, 2026, 5:06 p.m.