estimate_indirect_corr: Estimation of Indirect Correlations

View source: R/estimate_indirect_corr.R

estimate_indirect_corrR Documentation

Estimation of Indirect Correlations

Description

This function estimates indirect correlations starting from the incomplete correlation matrix in input. Indirect correlations to be estimated must be indicated by NA values in the input correlation matrix.

Usage

estimate_indirect_corr(
  corrMatStart,
  force_estimate = FALSE,
  widen_factor = 0.2
)

Arguments

corrMatStart

Matrix object which represents a correlation matrix. Indirect correlations to be estimated must be indicated by NA. This matrix must be symmetric, thus it must contain at least two NA values.

force_estimate

Boolean flag. When this flag is TRUE, if the obtained correlation matrix is not positive definite, it is approximated to the nearest positive definite matrix based on the Frobenius norm. Matrix approximation may alter fixed initial correlations. If this flag is set to FALSE (default option), matrix approximation is skipped and a warning message is returned.

widen_factor

number between 0 and 1. If there is a unique path, the range for that indirect correlation is computed considering cost +/- widen_factor*cost, where cost is the cost of the unique existing path. Default value is 0.2.

Details

Indirect correlations are estimated solving a constrained optimization problem. Starting from the fixed correlations, a correlation graph is built. Then, for each couple of variables whose indirect correlation is unknown (i.e. NA values), all the possible paths among them are considered (without visiting a node more than once). The cost of each path is computed by multiplying the correlations along it. The maximum and the minimum costs provide a reasonable range for the indirect correlation value. If there is not any path between two nodes, that indirect correlation will not be estimated and it will be automatically set to 0. If there is a unique path, the range for that indirect correlation is computed considering cost +/- widen_factor*cost, where cost is the cost of the unique existing path. The default value of widen_factor is 0.2.

Given the bounds of indirect correlations, a constrained optimization problem is solved by minimizing the negative of minimum eigenvalue of correlation matrix. The starting values for the indirect correlation values are set equal to the middle-point of the computed bounds. If the estimated matrix is not positive definite, user can force a second optimization step in which the previously obtained matrix is approximated to its nearest positive definite matrix. Frobenius norm is used to measure distance between matrices. Note that this step may alter initial fixed correlations.

An indirect correlation between two variables can be estimated only if they are linked by at least one path in the correlation graph. If for all indirect correlations declared does not exist any path, this function prints a warning message and plots the correlation graph to support the debug.

Value

A list containing

corrMatFinal Matrix object containing the final correlation matrix with indirect correlations estimated
optim List of objects containing the outputs provided by the solver (fmincon of pracma package) used for the constrained optimization. It is returned if the optimization step converges to a positive definite matrix or if the optimization step fails and force_estimate is set to FALSE.
optim1 The same of optim. It is returned only when constrained optimization does not converge to a positive definite correlation matrix and force_estimate is set to TRUE.
optim2 List of objects containing the outputs provided by the function nearPD of Matrix package used to approximate the matrix obtained by solving the constrained optimization problem with the nearest positive definite correlation matrix. It is returned only when constrained optimization does not converge to a positive definite correlation matrix and force_estimate is set to TRUE.
optimizationBounds A matrix object with N(= number of indirect correlations) rows and 4 columns reporting:
  • var1: numerical index of X1, the first variable of the indirect correlation couple

  • var2: numerical index of X2, the second variable of the indirect correlation couple

  • lower: lower bound for the range of indirect correlation between var1 and var2

  • upper: upper bound for the range of indirect correlation between var1 and var2.

Author(s)

Alessandro De Carlo alessandro.decarlo01@universitadipavia.it

See Also

nearPD

fmincon

estimate_corr_bounds

Examples

#define initial correlation structure
c_start <- diag(rep(1,10))
c_start[1,2] <- -0.6
c_start[1,3] <- -0.75
c_start[2,3] <-0.95
c_start[2,4] <- 0.75
c_start[2,6] <- -0.6
c_start[2,8] <- 0.75
c_start[3,4] <- 0.6
c_start[3,8] <-0.75
c_start[4,7] <- 0.6
c_start[4,8]<-0.75
c_start[5,7] <- -0.95
#symmetric correlation structure
c_start <- c_start+t(c_start)-diag(rep(1,ncol(c_start)))
#assign NA to indirect correlations to be estimated
c_start[c_start==0]<-NA
#names of variables
colnames(c_start)<- paste(rep("X",10),1:10,sep = "")
rownames(c_start) <- paste(rep("X",10),1:10,sep = "")
# plot initial correlation matrix
plot_graph_corr(c_start,"Graph of Initial Correlation Matrix")
r<-estimate_indirect_corr(c_start)
#see final output
plot_graph_corr(r$corrMatFinal,'Graph of Final Correlation Matrix')



AlessandroDeCarlo27/mvlognCorrEst documentation built on March 23, 2023, 10:11 a.m.