Create AGS from a raw data matrix.


This function creates a number of new AGSs from a given dataset, such as gene copy number, gene/protein expression, gene methylation etc. Such matrices M typically have size Ngenes x Nsamples, so that the current function returns a list of length=ncol(M). The AGSs for each of the Nsamples are created with one of the five available methods (see parameter method).


samples2ags(m0, Ntop = NA, col.mask = NA, namesFromColumn = NA,
  method = c("significant", "top", "toppos", "topnorm", "toprandom"),
  Lowercase = 1, cutoff.q = 0.05)



input matrix.


Number of top ranking genes to include into each sample-specific AGS. Mutually exclusive with "cutoff.q". A practically recommended value of Ntop could be in the range 30...300. Ntop>1000 might decrease the analysis specificity.


To include only columns with IDs that contain the specified mask. Follows the regular expression synthax.


Number of the column (if any) that contains the gene/protein names. Note that it is only necessary of the latter are NOT the unique rownames of the matrix. This could be sometimes needed to be able to process redundant expression etc. profiles.


Method to select sample-specific genes. One of

  • "significant" : The default. Using one-sided z-test, selects genes values of which in the given sample i deviate from the mean over all the samples, requiring q-value (Benjamini-Hochberg FDR) be below cutoff.q.

  • "topnorm" : similar to "significant", i.e. calculates (x[i] - mean(x)) / sd(x) but does not evaluate significance. Instead, top N ranked genes Ntop are taken into AGS.

  • "top" : similar to "topnorm", but x[i] - mean(x) is not divided with sd(x). This might help to prioritize genes with higher mean(x) and ignore ones with low signal. Consider also that AGSs from "top" overlap much more with each other than those from "topnorm", i.e. would be less sample-specific.

  • "toppos" : similar to "top", but retrives only genes with positive values of x[i] - mean(x). This might be useful when the gene expression values are small counts (such as in sincle-cell RNA sequencing), so that considering the left part of the distribution would not bring high-quality AGS.

  • "toprandom" : generates lists of Ntop random genes for each AGS.


Render gene/protein IDs lower-case.


cutoff value. Default 0.05. Mutually exclusive with "Ntop".


data("fantom5.43samples", package="NEArender")
ags.list <- samples2ags(fantom5.43samples, cutoff.q = 0.01, method="significant")

Want to suggest features or report bugs for Use the GitHub issue tracker. Vote for new features on Trello.

comments powered by Disqus