Description Usage Arguments Value Examples

Determine de-novo gene clusters, their weighted PCA lambda1 values, and random matrix expectation.

1 2 3 4 5 6 | ```
pagoda.gene.clusters(varinfo, trim = 3.1/ncol(varinfo$mat),
n.clusters = 150, n.samples = 60, cor.method = "p",
n.internal.shuffles = 0, n.starts = 10, n.cores = detectCores(),
verbose = 0, plot = FALSE, show.random = FALSE, n.components = 1,
method = "ward.D", secondary.correlation = FALSE,
n.cells = ncol(varinfo$mat), old.results = NULL)
``` |

`varinfo` |
varinfo adjusted variance info from pagoda.varinfo() (or pagoda.subtract.aspect()) |

`trim` |
additional Winsorization trim value to be used in determining clusters (to remove clusters that group outliers occurring in a given cell). Use higher values (5-15) if the resulting clusters group outlier patterns |

`n.clusters` |
number of clusters to be determined (recommended range is 100-200) |

`n.samples` |
number of randomly generated matrix samples to test the background distribution of lambda1 on |

`cor.method` |
correlation method ("pearson", "spearman") to be used as a distance measure for clustering |

`n.internal.shuffles` |
number of internal shuffles to perform (only if interested in set coherence, which is quite high for clusters by definition, disabled by default; set to 10-30 shuffles to estimate) |

`n.starts` |
number of wPCA EM algorithm starts at each iteration |

`n.cores` |
number of cores to use |

`verbose` |
verbosity level |

`plot` |
whether a plot showing distribution of random lambda1 values should be shown (along with the extreme value distribution fit) |

`show.random` |
whether the empirical random gene set values should be shown in addition to the Tracy-Widom analytical approximation |

`n.components` |
number of PC to calculate (can be increased if the number of clusters is small and some contain strong secondary patterns - rarely the case) |

`method` |
clustering method to be used in determining gene clusters |

`secondary.correlation` |
whether clustering should be performed on the correlation of the correlation matrix instead |

`n.cells` |
number of cells to use for the randomly generated cluster lambda1 model |

`old.results` |
optionally, pass old results just to plot the model without recalculating the stats |

a list containing the following fields:

clusters a list of genes in each cluster values

xf extreme value distribution fit for the standardized lambda1 of a randomly generated pattern

tci index of a top cluster in each random iteration

cl.goc weighted PCA info for each real gene cluster

varm standardized lambda1 values for each randomly generated matrix cluster

clvlm a linear model describing dependency of the cluster lambda1 on a Tracy-Widom lambda1 expectation

1 2 3 4 5 6 | ```
data(pollen)
cd <- clean.counts(pollen)
knn <- knn.error.models(cd, k=ncol(cd)/4, n.cores=10, min.count.threshold=2, min.nonfailed=5, max.model.plots=10)
varinfo <- pagoda.varnorm(knn, counts = cd, trim = 3/ncol(cd), max.adj.var = 5, n.cores = 1, plot = FALSE)
clpca <- pagoda.gene.clusters(varinfo, trim=7.1/ncol(varinfo$mat), n.clusters=150, n.cores=10, plot=FALSE)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.