polyserial: Polyserial Correlation Between Continuous and Ordinal...

View source: R/latent_corr.R

polyserialR Documentation

Polyserial Correlation Between Continuous and Ordinal Variables

Description

Computes polyserial correlations between continuous variables in data and ordinal variables in y. Both pairwise vector mode and rectangular matrix/data-frame mode are supported.

Usage

polyserial(data, y, na_method = c("error", "pairwise"), ci = FALSE, p_value = FALSE,
  conf_level = 0.95, ...)

## S3 method for class 'polyserial_corr'
print(
  x,
  digits = 4,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

## S3 method for class 'polyserial_corr'
plot(
  x,
  title = "Polyserial correlation heatmap",
  low_color = "indianred1",
  high_color = "steelblue1",
  mid_color = "white",
  value_text_size = 4,
  show_value = TRUE,
  ...
)

## S3 method for class 'polyserial_corr'
summary(
  object,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  ci_digits = 3,
  p_digits = 4,
  show_ci = NULL,
  ...
)

## S3 method for class 'summary.polyserial_corr'
print(
  x,
  digits = NULL,
  n = NULL,
  topn = NULL,
  max_vars = NULL,
  width = NULL,
  show_ci = NULL,
  ...
)

Arguments

data

A numeric vector, matrix, or data frame containing continuous variables.

y

An ordinal vector, matrix, or data frame containing ordinal variables. Supported columns are factors, ordered factors, logical values, or integer-like numerics.

na_method

Character scalar controlling missing-data handling. "error" rejects missing values. "pairwise" uses pairwise complete cases.

ci

Logical (default FALSE). If TRUE, attach model-based large-sample Wald confidence intervals derived from the observed information matrix of the latent-variable likelihood.

p_value

Logical (default FALSE). If TRUE, attach model-based large-sample Wald p-values and test statistics for each estimated latent correlation.

conf_level

Confidence level used when ci = TRUE. Default is 0.95.

...

Additional arguments passed to print().

x

An object of class summary.polyserial_corr.

digits

Integer; number of decimal places to print.

n

Optional row threshold for compact preview output.

topn

Optional number of leading/trailing rows to show when truncated.

max_vars

Optional maximum number of visible columns; NULL derives this from console width.

width

Optional display width; defaults to getOption("width").

show_ci

One of "yes" or "no".

title

Plot title. Default is "Polyserial correlation heatmap".

low_color

Color for the minimum correlation.

high_color

Color for the maximum correlation.

mid_color

Color for zero correlation.

value_text_size

Font size used in tile labels.

show_value

Logical; if TRUE (default), overlay numeric values on the heatmap tiles.

object

An object of class polyserial_corr.

ci_digits

Integer; digits for confidence limits in the pairwise summary.

p_digits

Integer; digits for p-values in the pairwise summary.

Details

The polyserial correlation assumes a latent bivariate normal model between a continuous variable and an unobserved continuous propensity underlying an ordinal variable. Let (X, Z)^\top \sim N_2(0, \Sigma) with \mathrm{corr}(X,Z)=\rho, and suppose the observed ordinal response Y is formed by cut-points -\infty = \beta_0 < \beta_1 < \cdots < \beta_K = \infty:

Y = k \iff \beta_{k-1} < Z \le \beta_k.

After standardising the observed continuous variable X, the thresholds are estimated from the marginal proportions of Y. Conditional on an observed x_i, the category probability is

\Pr(Y_i = k \mid X_i = x_i, \rho) = \Phi\!\left(\frac{\beta_k - \rho x_i}{\sqrt{1-\rho^2}}\right) - \Phi\!\left(\frac{\beta_{k-1} - \rho x_i}{\sqrt{1-\rho^2}}\right).

The returned estimate maximises the log-likelihood

\ell(\rho) = \sum_{i=1}^{n}\log \Pr(Y_i = y_i \mid X_i = x_i, \rho)

over \rho \in (-1,1) via a one-dimensional Brent search in C++.

Assumptions. The coefficient is appropriate when the ordinal variable is viewed as the discretised version of a latent normal variable that is jointly normal with the observed continuous variable. The optional p-values and confidence intervals adopt this latent-normal interpretation and use the same likelihood that defines the polyserial estimate. These inferential quantities are therefore model-based and should not be interpreted as distribution-free summaries.

Inference. When ci = TRUE or p_value = TRUE, the function refits the pairwise polyserial model by maximum likelihood and obtains the observed information matrix numerically in C++. The reported confidence interval is a Wald interval \hat\rho \pm z_{1-\alpha/2}\operatorname{SE}(\hat\rho), and the reported p-value is from the large-sample Wald z-test for H_0:\rho = 0. These inferential quantities are only computed when explicitly requested.

In vector mode a single estimate is returned. In matrix/data-frame mode, every numeric column of data is paired with every ordinal column of y, producing a rectangular matrix of continuous-by-ordinal polyserial correlations.

Computational complexity. If data has p_x continuous columns and y has p_y ordinal columns, the matrix path computes p_x p_y separate one-parameter likelihood optimisations.

Value

If both data and y are vectors, a numeric scalar. Otherwise a numeric matrix of class polyserial_corr with rows corresponding to the continuous variables in data and columns to the ordinal variables in y. Matrix outputs carry attributes method, description, and package = "matrixCorr". When p_value = TRUE, the returned object also carries an inference attribute with elements estimate, statistic, parameter, p_value, and n_obs. When ci = TRUE, it also carries a ci attribute with elements est, lwr.ci, upr.ci, conf.level, and ci.method, plus attr(x, "conf.level"). Scalar outputs keep the same point estimate and gain the same metadata only when inference is requested.

Author(s)

Thiago de Paula Oliveira

References

Olsson, U., Drasgow, F., & Dorans, N. J. (1982). The polyserial correlation coefficient. Psychometrika, 47(3), 337-347.

Examples


set.seed(125)
n <- 1000
Sigma <- matrix(c(
  1.00, 0.30, 0.55, 0.20,
  0.30, 1.00, 0.25, 0.50,
  0.55, 0.25, 1.00, 0.40,
  0.20, 0.50, 0.40, 1.00
), 4, 4, byrow = TRUE)

Z <- mnormt::rmnorm(n = n, mean = rep(0, 4), varcov = Sigma)
X <- data.frame(x1 = Z[, 1], x2 = Z[, 2])
Y <- data.frame(
  y1 = ordered(cut(
    Z[, 3],
    breaks = c(-Inf, -0.5, 0.7, Inf),
    labels = c("low", "mid", "high")
  )),
  y2 = ordered(cut(
    Z[, 4],
    breaks = c(-Inf, -1.0, 0.0, 1.0, Inf),
    labels = c("1", "2", "3", "4")
  ))
)

ps <- polyserial(X, Y)
print(ps, digits = 3)
summary(ps)
plot(ps)


matrixCorr documentation built on April 18, 2026, 5:06 p.m.