CalculatingCOEX: Calculating the COEX matrix for genes and cells

getMuR Documentation

Calculating the COEX matrix for genes and cells

Description

These are the functions and methods used to calculate the COEX matrices according to the COTAN model. From there it is possible to calculate the associated pValue and the GDI (Global Differential Expression)

The COEX matrix is defined by following formula:

\frac{\sum_{i,j \in \{\text{Y, N}\}}{ (-1)^{\#\{i,j\}}\frac{O_{ij}-E_{ij}}{1 \vee E_{ij}}}} {\sqrt{n \sum_{i,j \in \{\text{Y, N}\}}{ \frac{1}{1 \vee E_{ij}}}}}

where O and E are the observed and expected contingency tables and n is the relevant numerosity (the number of genes/cells depending on given actOnCells flag).

The formula can be more effectively implemented as:

\sqrt{\frac{1}{n}\sum_{i,j \in \{\text{Y, N}\}}{ \frac{1}{1 \vee E_{ij}}}} \, \bigl(O_\text{YY}-E_\text{YY}\bigr)

once one notices that O_{ij} - E_{ij} = (-1)^{\#\{i,j\}} \, r for some constant r for all i,j \in \{\text{Y, N}\}.

The latter follows from the fact that the relevant marginal sums of the expected contingency tables were enforced to match the marginal sums of the observed ones.

The new implementation of the function relies on the torch package. This implies that it is potentially able to use the system GPU to run the heavy duty calculations required by this method. However installing the torch package on a system can be finicky, so we tentatively provide a short help page Installing_torch hoping that it will help...

Usage

getMu(objCOTAN)

## S4 method for signature 'COTAN'
getGenesCoex(
  objCOTAN,
  genes = vector(mode = "character"),
  zeroDiagonal = TRUE,
  ignoreSync = FALSE
)

## S4 method for signature 'COTAN'
getCellsCoex(
  objCOTAN,
  cells = vector(mode = "character"),
  zeroDiagonal = TRUE,
  ignoreSync = FALSE
)

## S4 method for signature 'COTAN'
isCoexAvailable(objCOTAN, actOnCells = FALSE, ignoreSync = FALSE)

## S4 method for signature 'COTAN'
dropGenesCoex(objCOTAN)

## S4 method for signature 'COTAN'
dropCellsCoex(objCOTAN)

calculateLikelihoodOfObserved(objCOTAN)

observedContingencyTablesYY(
  objCOTAN,
  actOnCells = FALSE,
  asDspMatrices = FALSE
)

observedPartialContingencyTablesYY(
  objCOTAN,
  columnsSubset,
  zeroOne = NULL,
  actOnCells = FALSE
)

observedContingencyTables(objCOTAN, actOnCells = FALSE, asDspMatrices = FALSE)

observedPartialContingencyTables(
  objCOTAN,
  columnsSubset,
  zeroOne = NULL,
  actOnCells = FALSE
)

expectedContingencyTablesNN(
  objCOTAN,
  actOnCells = FALSE,
  asDspMatrices = FALSE,
  optimizeForSpeed = TRUE
)

expectedPartialContingencyTablesNN(
  objCOTAN,
  columnsSubset,
  probZero = NULL,
  actOnCells = FALSE,
  optimizeForSpeed = TRUE
)

expectedContingencyTables(
  objCOTAN,
  actOnCells = FALSE,
  asDspMatrices = FALSE,
  optimizeForSpeed = TRUE
)

expectedPartialContingencyTables(
  objCOTAN,
  columnsSubset,
  probZero = NULL,
  actOnCells = FALSE,
  optimizeForSpeed = TRUE
)

contingencyTables(objCOTAN, g1, g2)

## S4 method for signature 'COTAN'
calculateCoex(
  objCOTAN,
  actOnCells = FALSE,
  returnPPFract = FALSE,
  optimizeForSpeed = TRUE,
  deviceStr = "cuda"
)

calculatePartialCoex(
  objCOTAN,
  columnsSubset,
  probZero = NULL,
  zeroOne = NULL,
  actOnCells = FALSE,
  optimizeForSpeed = TRUE
)

calculateS(
  objCOTAN,
  geneSubsetCol = vector(mode = "character"),
  geneSubsetRow = vector(mode = "character")
)

calculateG(
  objCOTAN,
  geneSubsetCol = vector(mode = "character"),
  geneSubsetRow = vector(mode = "character")
)

Arguments

objCOTAN

a COTAN object

genes

The given genes' names to select the wanted COEX columns. If missing all columns will be returned. When not empty a proper result is provided by calculating the partial COEX matrix on the fly

zeroDiagonal

When TRUE sets the diagonal to zero.

ignoreSync

When TRUE ignores whether the lambda/nu/dispersion have been updated since the COEX matrix was calculated.

cells

The given cells' names to select the wanted COEX columns. If missing all columns will be returned. When not empty a proper result is provided by calculating the partial COEX matrix on the fly

actOnCells

Boolean; when TRUE the function works for the cells, otherwise for the genes

asDspMatrices

Boolean; when TRUE the function will return only packed dense symmetric matrices

columnsSubset

a sub-set of the columns of the matrices that will be returned

zeroOne

the raw count matrix projected to 0 or 1. If not given the appropriate one will be calculated on the fly

optimizeForSpeed

Boolean; deprecated: always TRUE

probZero

is the expected probability of zero for each gene/cell pair. If not given the appropriate one will be calculated on the fly

g1

a gene

g2

another gene

returnPPFract

Boolean; when TRUE the function returns the fraction of genes/cells pairs for which the expected contingency table is smaller than 0.5. Default is FALSE

deviceStr

On the torch library enforces which device to use to run the calculations. Possible values are "cpu" to us the system CPU, "cuda" to use the system GPUs or something like "cuda:0" to restrict to a specific device

geneSubsetCol

an array of genes. It will be put in columns. If left empty the function will do it genome-wide.

geneSubsetRow

an array of genes. It will be put in rows. If left empty the function will do it genome-wide.

Details

getMu() calculates the vector \mu = \lambda \times \nu^T

getGenesCoex() extracts a complete (or a partial after genes dropping) genes' COEX matrix from the COTAN object.

getCellsCoex() extracts a complete (or a partial after cells dropping) cells' COEX matrix from the COTAN object.

isCoexAvailable() allows to query whether the relevant COEX matrix from the COTAN object is available to use

dropGenesCoex() drops the genesCoex member from the given COTAN object

dropCellsCoex() drops the cellsCoex member from the given COTAN object

calculateLikelihoodOfObserved() gives for each cell and each gene the likelihood of the observed zero/one data

observedContingencyTablesYY() calculates observed Yes/Yes field of the contingency table

observedPartialContingencyTablesYY() calculates observed Yes/Yes field of the contingency table

observedContingencyTables() calculates the observed contingency tables. When the parameter asDspMatrices == TRUE, the method will effectively throw away the lower half from the returned observedYN and observedNY matrices, but, since they are transpose one of another, the full information is still available.

observedPartialContingencyTables() calculates the observed contingency tables.

expectedContingencyTablesNN() calculates the expected No/No field of the contingency table

expectedPartialContingencyTablesNN() calculates the expected No/No field of the contingency table

expectedContingencyTables() calculates the expected values of contingency tables. When the parameter asDspMatrices == TRUE, the method will effectively throw away the lower half from the returned expectedYN and expectedNY matrices, but, since they are transpose one of another, the full information is still available.

expectedPartialContingencyTables() calculates the expected values of contingency tables, restricted to the specified column sub-set

contingencyTables() returns the observed and expected contingency tables for a given pair of genes. The implementation runs the same algorithms used to calculate the full observed/expected contingency tables, but restricted to only the relevant genes and thus much faster and less memory intensive

calculateCoex() estimates and stores the COEX matrix in the cellCoex or genesCoex field depending on given actOnCells flag. It also calculates the percentage of problematic genes/cells pairs. A pair is problematic when one or more of the expected counts were significantly smaller than 1 (< 0.5). These small expected values signal that scant information is present for such a pair.

calculatePartialCoex() estimates a sub-section of the COEX matrix in the cellCoex or genesCoex field depending on given actOnCells flag. It also calculates the percentage of problematic genes/cells pairs. A pair is problematic when one or more of the expected counts were significantly smaller than 1 (< 0.5). These small expected values signal that scant information is present for such a pair.

calculateS() calculates the statistics S for genes contingency tables. It always has the diagonal set to zero.

calculateG() calculates the statistics G-test for genes contingency tables. It always has the diagonal set to zero. It is proportional to the genes' presence mutual information.

Value

getMu() returns the mu matrix

getGenesCoex() returns the genes' COEX values

getCellsCoex() returns the cells' COEX values

isCoexAvailable() returns whether relevant COEX matrix has been calculated and, in case, if it is still aligned to the estimators.

dropGenesCoex() returns the updated COTAN object

dropCellsCoex() returns the updated COTAN object

calculateLikelihoodOfObserved() returns a data.frame with the likelihood of the observed zero/one

observedContingencyTablesYY() returns a list with:

  • observedYY the Yes/Yes observed contingency table as matrix

  • observedY the full Yes observed vector

observedPartialContingencyTablesYY() returns a list with:

  • observedYY the Yes/Yes observed contingency table as matrix, restricted to the selected columns as named list with elements

  • observedY the full Yes observed vector

observedContingencyTables() returns the observed contingency tables as named list with elements:

  • "observedNN"

  • "observedNY"

  • "observedYN"

  • "observedYY"

observedPartialContingencyTables() returns the observed contingency tables, restricted to the selected columns, as named list with elements:

  • "observedNN"

  • "observedNY"

  • "observedYN"

  • "observedYY"

expectedContingencyTablesNN() returns a list with:

  • expectedNN the No/No expected contingency table as matrix

  • expectedN the No expected vector

expectedPartialContingencyTablesNN() returns a list with:

  • expectedNN the No/No expected contingency table as matrix, restricted to the selected columns, as named list with elements

  • expectedN the full No expected vector

expectedContingencyTables() returns the expected contingency tables as named list with elements:

  • "expectedNN"

  • "expectedNY"

  • "expectedYN"

  • "expectedYY"

expectedPartialContingencyTables() returns the expected contingency tables, restricted to the selected columns, as named list with elements:

  • "expectedNN"

  • "expectedNY"

  • "expectedYN"

  • "expectedYY"

contingencyTables() returns a list containing the observed and expected contingency tables

calculateCoex() returns the updated COTAN object

calculatePartialCoex() returns the asked section of the COEX matrix

calculateS() returns the S matrix

calculateG() returns the G matrix

Note

The sum of the matrices returned by the function observedContingencyTables() and expectedContingencyTables() will have the same value on all elements. This value is the number of genes/cells depending on the parameter actOnCells being TRUE/FALSE.

See Also

ParametersEstimations for more details.

Installing_torch about the torch package

Examples

data("test.dataset")
objCOTAN <- COTAN(raw = test.dataset)
objCOTAN <- initializeMetaDataset(objCOTAN, GEO = "test_GEO",
                                  sequencingMethod = "distribution_sampling",
                                  sampleCondition = "reconstructed_dataset")
objCOTAN <- clean(objCOTAN)

objCOTAN <- estimateDispersionBisection(objCOTAN, cores = 6L)

## Now the `COTAN` object is ready to calculate the genes' `COEX`

## mu <- getMu(objCOTAN)
## observedY <- observedContingencyTablesYY(objCOTAN, asDspMatrices = TRUE)
obs <- observedContingencyTables(objCOTAN, asDspMatrices = TRUE)

## expectedN <- expectedContingencyTablesNN(objCOTAN, asDspMatrices = TRUE)
exp <- expectedContingencyTables(objCOTAN, asDspMatrices = TRUE)

objCOTAN <- calculateCoex(objCOTAN, actOnCells = FALSE)

stopifnot(isCoexAvailable(objCOTAN))
genesCoex <- getGenesCoex(objCOTAN)
genesSample <- sample(getNumGenes(objCOTAN), 10)
partialGenesCoex <- calculatePartialCoex(objCOTAN, genesSample,
                                         actOnCells = FALSE)

identical(partialGenesCoex,
          getGenesCoex(objCOTAN, getGenes(objCOTAN)[sort(genesSample)]))

## S <- calculateS(objCOTAN)
## G <- calculateG(objCOTAN)
## pValue <- calculatePValue(objCOTAN)
gdiDF <- calculateGDI(objCOTAN)
objCOTAN <- storeGDI(objCOTAN, genesGDI = gdiDF)

## Touching any of the lambda/nu/dispersino parameters invalidates the `COEX`
## matrix and derivatives, so it can be dropped it from the `COTAN` object
objCOTAN <- dropGenesCoex(objCOTAN)
stopifnot(!isCoexAvailable(objCOTAN))


objCOTAN <- estimateDispersionNuBisection(objCOTAN, cores = 6L)

## Now the `COTAN` object is ready to calculate the cells' `COEX`
## In case one need to caclualte both it is more sensible to run the above
## before any `COEX` evaluation

g1 <- getGenes(objCOTAN)[sample(getNumGenes(objCOTAN), 1)]
g2 <- getGenes(objCOTAN)[sample(getNumGenes(objCOTAN), 1)]
tables <- contingencyTables(objCOTAN, g1 = g1, g2 = g2)
tables

objCOTAN <- calculateCoex(objCOTAN, actOnCells = TRUE)
stopifnot(isCoexAvailable(objCOTAN, actOnCells = TRUE, ignoreSync = TRUE))
cellsCoex <- getCellsCoex(objCOTAN)

cellsSample <- sample(getNumCells(objCOTAN), 10)
partialCellsCoex <- calculatePartialCoex(objCOTAN, cellsSample,
                                         actOnCells = TRUE)

identical(partialCellsCoex, cellsCoex[, sort(cellsSample)])

objCOTAN <- dropCellsCoex(objCOTAN)
stopifnot(!isCoexAvailable(objCOTAN, actOnCells = TRUE))

lh <- calculateLikelihoodOfObserved(objCOTAN)


seriph78/COTAN documentation built on Dec. 10, 2024, 3:30 a.m.