GenesStatistics: Calculations of genes statistics
In seriph78/COTAN: COexpression Tables ANalysis

getGDI,COTAN-method

R Documentation

Calculations of genes statistics

Description

A collection of functions returning various statistics associated to the genes. In particular the discrepancy between the expected probabilities of zero and their actual occurrences, both at single gene level or looking at genes' pairs

To make the GDI more specific, it may be desirable to restrict the set of genes against which GDI is computed to a selected subset, with the recommendation to include a consistent fraction of cell-identity genes, and possibly focusing on markers specific for the biological question of interest (for instance neural cortex layering markers). In this case we denote it as Local Differentiation Index (LDI) relative to the selected subset.

Usage

## S4 method for signature 'COTAN'
getGDI(objCOTAN)

## S4 method for signature 'COTAN'
storeGDI(objCOTAN, genesGDI)

genesCoexSpace(objCOTAN, primaryMarkers, numGenesPerMarker = 25L)

establishGenesClusters(
  objCOTAN,
  groupMarkers,
  numGenesPerMarker = 25L,
  kCuts = 6L,
  distance = "cosine",
  hclustMethod = "ward.D2"
)

calculateGenesCE(objCOTAN)

calculateGDIGivenCorr(corr, numDegreesOfFreedom, rowsFraction = 0.05)

calculateGDI(objCOTAN, statType = "S", rowsFraction = 0.05)

calculatePValue(
  objCOTAN,
  statType = "S",
  geneSubsetCol = vector(mode = "character"),
  geneSubsetRow = vector(mode = "character")
)

calculatePDI(
  objCOTAN,
  statType = "S",
  geneSubsetCol = vector(mode = "character"),
  geneSubsetRow = vector(mode = "character")
)

Arguments

`objCOTAN`	a `COTAN` object
`genesGDI`	the named genes' GDI `array` to store or the output `data.frame` of the function `calculateGDI()`
`primaryMarkers`	A vector of primary marker names.
`numGenesPerMarker`	the number of correlated genes to keep as other markers (default 25)
`groupMarkers`	a named `list` with an element for each group comprised of one or more marker genes
`kCuts`	the number of estimated cluster (this defines the height for the tree cut)
`distance`	type of distance to use. Default is `"cosine"`. Can be chosen among those supported by `parallelDist::parDist()`
`hclustMethod`	default is "ward.D2" but can be any method defined by `stats::hclust()` function
`corr`	a `matrix` object, possibly a subset of the columns of the full symmetric matrix
`numDegreesOfFreedom`	a `int` that determines the number of degree of freedom to use in the `\chi^{2}` test
`rowsFraction`	The fraction of rows that will be averaged to calculate the `GDI`. Defaults to `5\%`
`statType`	Which statistics to use to compute the p-values. By default it will use the "S" (Pearson's `\chi^{2}` test) otherwise the "G" (G-test)
`geneSubsetCol`	an array of genes. It will be put in columns. If left empty the function will do it genome-wide.
`geneSubsetRow`	an array of genes. It will be put in rows. If left empty the function will do it genome-wide.

Details

getGDI() extracts the genes' GDI array as it was stored by the method storeGDI()

storeGDI() stored and already calculated genes' GDI array in a COTAN object. It can be retrieved using the method getGDI()

genesCoexSpace() calculates genes groups based on the primary markers and uses them to prepare the genes' COEX space data.frame.

establishGenesClusters() perform the genes' clustering based on a pool of gene markers, using the genes' COEX space

calculateGenesCE() is used to calculate the discrepancy between the expected probability of zero and the observed zeros across all cells for each gene as cross-entropy: -\sum_{c}{\mathbb{1}_{X_c == 0} \log(p_c) - \mathbb{1}_{X_c != 0} \log(1 - p_c)} where X_c is the observed count and p_c the probability of zero

calculateGDIGivenCorr() produces a vector with the GDI for each column based on the given correlation matrix, using the Pearson's \chi^{2} test

calculateGDI() produces a data.frame with the GDI for each gene based on the COEX matrix

calculatePValue() computes the p-values for genes in the COTAN object. It can be used genome-wide or by setting some specific genes of interest. By default it computes the p-values using the S statistics (\chi^{2})

calculatePDI() computes the p-values for genes in the COTAN object using calculatePValue() and takes their \log{({-\log{(\cdot)}})} to calculate the genes' Pair Differential Index

Value

getGDI() returns the genes' GDI array if available or NULL otherwise

storeGDI() returns the given COTAN object with updated GDI genes' information

genesCoexSpace() returns a list with:

"SecondaryMarkers" a named list that for each secondary marker, gives the list of primary markers that selected for it
"GCS" the relevant subset of COEX matrix
"rankGenes" a data.frame with the rank of each gene according to its p-value

establishGenesClusters() a list of:

"g.space" the genes' COEX space data.frame
"plot.eig" the eigenvalues plot
"pca_clusters" the pca components data.frame
"tree_plot" the tree plot for the genes' COEX space

calculateGenesCE() returns a named array with the cross-entropy of each gene

calculateGDIGivenCorr() returns a vector with the GDI data for each column of the input

calculateGDI() returns a data.frame with:

"sum.raw.norm" the sum of the normalized data rows
"GDI" the GDI data
"exp.cells" the percentage of cells expressing the gene

calculatePValue() returns a p-value matrix as dspMatrix

calculatePDI() returns a Pair Differential Index matrix as dspMatrix

Examples

data("test.dataset")
objCOTAN <- COTAN(raw = test.dataset)
objCOTAN <- proceedToCoex(objCOTAN, cores = 6L, saveObj = FALSE)

markers <- getGenes(objCOTAN)[sample(getNumGenes(objCOTAN), 10)]
GCS <- genesCoexSpace(objCOTAN, primaryMarkers = markers,
                      numGenesPerMarker = 15)

groupMarkers <- list(G1 = c("g-000010", "g-000020", "g-000030"),
                     G2 = c("g-000300"),
                     G3 = c("g-000510", "g-000530", "g-000550",
                            "g-000570", "g-000590"))

resList <-  establishGenesClusters(objCOTAN, groupMarkers = groupMarkers,
                                   numGenesPerMarker = 11)

seriph78/COTAN documentation built on June 1, 2025, 4:57 p.m.