GenesStatistics: Calculations of genes statistics

GenesStatisticsR Documentation

Calculations of genes statistics

Description

A collection of functions returning various statistics associated to the genes. In particular the discrepancy between the expected probabilities of zero and their actual occurrences, both at single gene level or looking at genes' pairs

To make the GDI more specific, it may be desirable to restrict the set of genes against which GDI is computed to a selected subset, with the recommendation to include a consistent fraction of cell-identity genes, and possibly focusing on markers specific for the biological question of interest (for instance neural cortex layering markers). In this case we denote it as Local Differentiation Index (LDI) relative to the selected subset.

Usage

genesCoexSpace(objCOTAN, primaryMarkers, numGenesPerMarker = 25L)

establishGenesClusters(
  objCOTAN,
  groupMarkers,
  numGenesPerMarker = 25L,
  kCuts = 6L,
  distance = "cosine",
  hclustMethod = "ward.D2"
)

calculateGenesCE(objCOTAN)

calculateGDIGivenCorr(corr, numDegreesOfFreedom, rowsFraction = 0.05)

calculateGDI(objCOTAN, statType = "S", rowsFraction = 0.05)

calculatePValue(
  objCOTAN,
  statType = "S",
  geneSubsetCol = vector(mode = "character"),
  geneSubsetRow = vector(mode = "character")
)

calculatePDI(
  objCOTAN,
  statType = "S",
  geneSubsetCol = vector(mode = "character"),
  geneSubsetRow = vector(mode = "character")
)

Arguments

objCOTAN

a COTAN object

primaryMarkers

A vector of primary marker names.

numGenesPerMarker

the number of correlated genes to keep as other markers (default 25)

groupMarkers

a named list with an element for each group comprised of one or more marker genes

kCuts

the number of estimated cluster (this defines the height for the tree cut)

distance

type of distance to use. Default is "cosine". Can be chosen among those supported by parallelDist::parDist()

hclustMethod

default is "ward.D2" but can be any method defined by stats::hclust() function

corr

a matrix object, possibly a subset of the columns of the full symmetric matrix

numDegreesOfFreedom

a int that determines the number of degree of freedom to use in the \chi^{2} test

rowsFraction

The fraction of rows that will be averaged to calculate the GDI. Defaults to 5\%

statType

Which statistics to use to compute the p-values. By default it will use the "S" (Pearson's \chi^{2} test) otherwise the "G" (G-test)

geneSubsetCol

an array of genes. It will be put in columns. If left empty the function will do it genome-wide.

geneSubsetRow

an array of genes. It will be put in rows. If left empty the function will do it genome-wide.

Details

genesCoexSpace() calculates genes groups based on the primary markers and uses them to prepare the genes' COEX space data.frame.

establishGenesClusters() perform the genes' clustering based on a pool of gene markers, using the genes' COEX space

calculateGenesCE() is used to calculate the discrepancy between the expected probability of zero and the observed zeros across all cells for each gene as cross-entropy: -\sum_{c}{\mathbb{1}_{X_c == 0} \log(p_c) - \mathbb{1}_{X_c != 0} \log(1 - p_c)} where X_c is the observed count and p_c the probability of zero

calculateGDIGivenCorr() produces a vector with the GDI for each column based on the given correlation matrix, using the Pearson's \chi^{2} test

calculateGDI() produces a data.frame with the GDI for each gene based on the COEX matrix

calculatePValue() computes the p-values for genes in the COTAN object. It can be used genome-wide or by setting some specific genes of interest. By default it computes the p-values using the S statistics (\chi^{2})

calculatePDI() computes the p-values for genes in the COTAN object using calculatePValue() and takes their \log{({-\log{(\cdot)}})} to calculate the genes' Pair Differential Index

Value

genesCoexSpace() returns a list with:

  • "SecondaryMarkers" a named list that for each secondary marker, gives the list of primary markers that selected for it

  • "GCS" the relevant subset of COEX matrix

  • "rankGenes" a data.frame with the rank of each gene according to its p-value

establishGenesClusters() a list of:

  • "g.space" the genes' COEX space data.frame

  • "plot.eig" the eigenvalues plot

  • "pca_clusters" the pca components data.frame

  • "tree_plot" the tree plot for the genes' COEX space

calculateGenesCE() returns a named array with the cross-entropy of each gene

calculateGDIGivenCorr() returns a vector with the GDI data for each column of the input

calculateGDI() returns a data.frame with:

  • "sum.raw.norm" the sum of the normalized data rows

  • "GDI" the GDI data

  • "exp.cells" the percentage of cells expressing the gene

calculatePValue() returns a p-value matrix as dspMatrix

calculatePDI() returns a Pair Differential Index matrix as dspMatrix

Examples

data("test.dataset")
objCOTAN <- COTAN(raw = test.dataset)
objCOTAN <- proceedToCoex(objCOTAN, cores = 6L, saveObj = FALSE)

markers <- getGenes(objCOTAN)[sample(getNumGenes(objCOTAN), 10)]
GCS <- genesCoexSpace(objCOTAN, primaryMarkers = markers,
                      numGenesPerMarker = 15)

groupMarkers <- list(G1 = c("g-000010", "g-000020", "g-000030"),
                     G2 = c("g-000300", "g-000330"),
                     G3 = c("g-000510", "g-000530", "g-000550",
                            "g-000570", "g-000590"))

resList <-  establishGenesClusters(objCOTAN, groupMarkers = groupMarkers,
                                   numGenesPerMarker = 11)


seriph78/COTAN documentation built on May 2, 2024, 11:17 a.m.