effective_cor: Estimates the effective correlation.
In seqgendiff: RNA-Seq Generation/Modification for Simulation

effective_cor

R Documentation

Estimates the effective correlation.

Description

Will return the estimated correlation between the design matrix and the surrogate variables when you assign a target correlation. The method is described in detail in Gerard (2020).

Usage

effective_cor(
  design_perm,
  sv,
  target_cor,
  calc_first = c("cor", "mean"),
  method = c("hungarian", "marriage"),
  iternum = 1000
)

Arguments

`design_perm`	A numeric design matrix whose rows are to be permuted (thus controlling the amount by which they are correlated with the surrogate variables). The rows index the samples and the columns index the variables. The intercept should not be included (though see Section "Unestimable Components").
`sv`	A matrix of surrogate variables
`target_cor`	A numeric matrix of target correlations between the variables in `design_perm` and the surrogate variables. The rows index the observed covariates and the columns index the surrogate variables. That is, `target_cor[i, j]` specifies the target correlation between the `i`th column of `design_perm` and the `j`th surrogate variable. The surrogate variables are estimated either using factor analysis or surrogate variable analysis (see the parameter `use_sva`). The number of columns in `target_cor` specifies the number of surrogate variables. Set `target_cor` to `NULL` to indicate that `design_perm` and the surrogate variables are independent.
`calc_first`	Should we calculate the correlation of the mean `design_perm` and `sv` (`calc_first = "mean"`), or should we calculate the mean of the correlations between `design_perm` and `sv` (`calc_first = "cor"`)? This should only be changed by expert users.
`method`	Should we use the Gale-Shapley algorithm for stable marriages (`"marriage"`) (Gale and Shapley, 1962) as implemented in the matchingR package, or the Hungarian algorithm (Papadimitriou and Steiglitz, 1982) (`"hungarian"`) as implemented in the clue package (Hornik, 2005)? The Hungarian method almost always works better, so is the default.
`iternum`	The total number of simulated correlations to consider.

Details

This function permutes the rows of design_perm many times, each time calculating the Pearson correlation between the columns of design_perm and the columns of sv. It then returns the averages of these Pearson correlations. The permutation is done using permute_design.

Value

A matrix of correlations. The rows index the observed covariates and the columns index the surrogate variables. Element (i, j) is the estimated correlation between the ith variable in design_perm and the jth variable in sv.

Author(s)

David Gerard

References

Gale, David, and Lloyd S. Shapley. "College admissions and the stability of marriage." The American Mathematical Monthly 69, no. 1 (1962): 9-15. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/00029890.1962.11989827")}.
Gerard, D (2020). "Data-based RNA-seq simulations by binomial thinning." BMC Bioinformatics. 21(1), 206. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1186/s12859-020-3450-9")}.
Hornik K (2005). "A CLUE for CLUster Ensembles." Journal of Statistical Software, 14(12). \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v014.i12")}. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.18637/jss.v014.i12")}.
C. Papadimitriou and K. Steiglitz (1982), Combinatorial Optimization: Algorithms and Complexity. Englewood Cliffs: Prentice Hall.

Examples

## Generate the design matrices and set target correlation -----------------
n <- 10
design_perm <- cbind(rep(c(0, 1), each = n / 2),
                     rep(c(0, 1), length.out = n))
sv <- matrix(rnorm(n))
target_cor <- matrix(c(0.9, 0.1), ncol = 1)

## Get estimated true correlation ------------------------------------------
## You should use a much larger iternum in practice
effective_cor(design_perm = design_perm,
              sv = sv,
              target_cor = target_cor,
              iternum = 10)

seqgendiff documentation built on June 22, 2024, 7 p.m.