corassign: Group assignment that is correlated with latent factors.

View source: R/poisthin.R

corassignR Documentation

Group assignment that is correlated with latent factors.

Description

We extract latent factors from the log of mat using an SVD, then generate an underlying group-assignment variable from a conditional normal distribution (conditional on the latent factors). This underlying group-assignment variable is used to assign groups.

Usage

corassign(mat, nfac = NULL, corvec = NULL, return = c("group", "full"))

Arguments

mat

A matrix of count data. The rows index the individuals and the columns index the genes.

nfac

The number of latent factors. If NULL, then we will use the method of Onatski (2010) to choose the number of latent factors.

corvec

The vector of correlations. corvec[i] is the correlation between latent factor i and the underlying group-assignment variable. You can think of the correlations in corvec as a kind of "tetrachoric correlation." If NULL, then it assumes independence between factors and group assignment. Note that the correlations of the latent factors with the observed group-assignment vector (instead of the latent group-assignment vector) will be corvec * sqrt(2 / pi).

return

What should we return? Just the group assignment ("group") or a list of a bunch of things ("full").

Details

If nfac is provided, then corvec must be the same length as nfac. If nfac is not provided, then it is assumed that the first nfac elements of corvec are the underlying correlations, if nfac turns out to be smaller than the length of corvec. If nfac turns out to be larger than the length of corvec, then the factors without defined correlations are assumed to have correlation 0.

Value

A list with some or all of the following elements:

x

The vector of group assignments. 0L indicates membership to one group and 1L indicates membership to the other group.

nfac

The number of assumed latent factors.

facmat

A matrix, whose columns contain the latent factors.

groupfac

The underlying group-assignment factor.

corvec

The correlation vector. Note that this is the correlation between random variables observed in groupfac and facmat,

If return = "group", then the list only contains x.

Author(s)

David Gerard

References

  • A. Onatski (2010), Determining the number of factors from empirical distribution of eigenvalues. The Review of Economics and Statistics 92(4).

Examples

## Simulate data from given matrix of counts
## In practice, you would obtain Y from a real dataset, not simulate it.
set.seed(1)
nsamp <- 1000
ngene <- 10
Y <- matrix(stats::rpois(nsamp * ngene, lambda = 50), nrow = ngene)

## Set target correlation to be 0.9 and nfac to be 1
corvec <- 0.9
nfac   <- 1

## Group assignment
cout <- corassign(mat    = t(Y),
                  nfac   = nfac,
                  corvec = corvec,
                  return = "full")

## Correlation between facmat and groupfac should be about 0.9
cor(cout$facmat, cout$groupfac)

## Correlation between facmat and x should be about 0.9 * sqrt(2 / pi)
cor(cout$facmat, cout$x)
corvec * sqrt(2 / pi)



seqgendiff documentation built on June 22, 2024, 7 p.m.