Create AGS from a raw data matrix.

Description

This function creates a number of new AGSs from a given dataset, such as gene copy number, gene/protein expression, gene methylation etc. Such matrices M typically have size Ngenes x Nsamples, so that the current function returns a list of length=ncol(M). The AGSs for each of the Nsamples are created with one of the five available methods (see parameter method).

Usage

1
2
3
samples2ags(m0, Ntop = NA, col.mask = NA, namesFromColumn = NA,
  method = c("significant", "top", "toppos", "topnorm", "toprandom"),
  Lowercase = 1, cutoff.q = 0.05)

Arguments

m0

input matrix.

Ntop

Number of top ranking genes to include into each sample-specific AGS. Mutually exclusive with "cutoff.q". A practically recommended value of Ntop could be in the range 30...300. Ntop>1000 might decrease the analysis specificity.

col.mask

To include only columns with IDs that contain the specified mask. Follows the regular expression synthax.

namesFromColumn

Number of the column (if any) that contains the gene/protein names. Note that it is only necessary of the latter are NOT the unique rownames of the matrix. This could be sometimes needed to be able to process redundant expression etc. profiles.

method

Method to select sample-specific genes. One of

  • "significant" : The default. Using one-sided z-test, selects genes values of which in the given sample i deviate from the mean over all the samples, requiring q-value (Benjamini-Hochberg FDR) be below cutoff.q.

  • "topnorm" : similar to "significant", i.e. calculates (x[i] - mean(x)) / sd(x) but does not evaluate significance. Instead, top N ranked genes Ntop are taken into AGS.

  • "top" : similar to "topnorm", but x[i] - mean(x) is not divided with sd(x). This might help to prioritize genes with higher mean(x) and ignore ones with low signal. Consider also that AGSs from "top" overlap much more with each other than those from "topnorm", i.e. would be less sample-specific.

  • "toppos" : similar to "top", but retrives only genes with positive values of x[i] - mean(x). This might be useful when the gene expression values are small counts (such as in sincle-cell RNA sequencing), so that considering the left part of the distribution would not bring high-quality AGS.

  • "toprandom" : generates lists of Ntop random genes for each AGS.

Lowercase

Render gene/protein IDs lower-case.

cutoff.q

cutoff value. Default 0.05. Mutually exclusive with "Ntop".

Examples

1
2
data("fantom5.43samples", package="NEArender")
ags.list <- samples2ags(fantom5.43samples, cutoff.q = 0.01, method="significant")