findClusters: Find Clusters Epigenetically Modified Genes
In markrobinsonuzh/Repitools: Epigenomic tools

findClusters

R Documentation

Find Clusters Epigenetically Modified Genes

Description

Given a table of gene positions that has a score column, genes will first be sorted into positional order and consecutive windows of high or low scores will be reported.

Usage

  findClusters(stats, score.col = NULL, w.size = NULL, n.med = NULL, n.consec = NULL,
               cut.samps = NULL, maxFDR = 0.05, trend = c("down", "up"), n.perm = 100,
               getFDRs = FALSE, verbose = TRUE)

Arguments

`stats`	A `data.frame` with (at least) column `chr`, and a column of scores. Genes must be sorted in positional order.
`score.col`	A number that gives the column in `stats` which contains the scores.
`w.size`	The number of consecutive genes to consider windows over. Must be odd.
`n.med`	Minimum number of genes in a window, that have median score centred around them above a cutoff.
`n.consec`	Minimum cluster size.
`cut.samps`	A vector of score cutoffs to calculate the FDR at.
`maxFDR`	The highest FDR level still deemed to be significant.
`trend`	Whether the clusters must have all positive scores (enrichment), or all negative scores (depletion).
`n.perm`	How many random tables to generate to use in the FDR calculations.
`getFDRs`	If TRUE, will also return the table of FDRs at a variety of score cutoffs, from which the score cutoff for calling clusters was chosen.
`verbose`	Whether to print progress of computations.

Details

First, the median over a window of size w.size is calculated in a rolling window and then associated with the middle gene of the window. Windows are again run over the genes, and the gene at the centre of the window is significant if there are also at least n.med genes with representative medians above the score cutoff, in the window that surrounds it. These marker genes are extended outwards, for as long as the score has the same sign. The order of the stats rows is randomised, and this process in done for every randomisation.

The procedure for calling clusters is done at a range of score cutoffs. The first score cutoff to give an FDR below maxFDR is chosen as the cutoff to use, and clusters are then called based on this cutoff.

Value

If getFDRs is FALSE, then only the stats table, with an additional column, cluster. If getFDRs is TRUE, then a list with elements :

`table`	The table `stats` with the additional column `cluster`.
`FDR`	The table of score cutoffs tried, and their FDRs.

Author(s)

Dario Strbenac, Aaron Statham

References

Saul Bert, in preparation

Examples

  chrs <- sample(paste("chr", c(1:5), sep = ""), 500, replace = TRUE)
  starts <- sample(1:10000000, 500, replace = TRUE)
  ends <- starts + 10000
  genes <- data.frame(chr = chrs, start = starts, end = ends, strand = '+')
  genes <- genes[order(genes$chr, genes$start), ]
  genes$t.stat = rnorm(500, 0, 2)
  genes$t.stat[21:30] = rnorm(10, 4, 1)
  findClusters(genes, 5, 5, 2, 3, seq(1, 10, 1), trend = "up", n.perm = 2)

markrobinsonuzh/Repitools documentation built on March 20, 2024, 6:04 a.m.