qSig: Helper Function to Construct a 'qSig' Object

Description Usage Arguments Value See Also Examples

View source: R/qSig-methods.R

Description

It builds a 'qSig' object to store the query signature, reference database and GESS method used for GESS methods

Usage

1
qSig(query, gess_method, refdb)

Arguments

query

If 'gess_method' is 'CMAP' or 'LINCS', it should be a list with two character vectors named upset and downset for up- and down-regulated gene labels, respectively. The labels should be gene Entrez IDs if the reference database is a pre-built CMAP or LINCS database. If a custom database is used, the labels need to be of the same type as those in the reference database.

If 'gess_method' is 'gCMAP', the query is a matrix with a single column representing gene ranks from a biological state of interest. The corresponding gene labels are stored in the row name slot of the matrix. Instead of ranks one can provide scores (e.g. z-scores). In such a case, the scores will be internally transformed to ranks.

If 'gess_method' is 'Fisher', the query is expected to be a list with two character vectors named upset and downset for up- and down-regulated gene labels, respectively (same as for 'CMAP' or 'LINCS' method). Internally, the up/down gene labels are combined into a single gene set when querying the reference database with the Fisher's exact test. This means the query is performed with an unsigned set. The query can also be a matrix with a single numeric column and the gene labels (e.g. Entrez gene IDs) in the row name slot. The values in this matrix can be z-scores or LFCs. In this case, the actual query gene set is obtained according to upper and lower cutoffs in the gess_fisher function set by the user.

If 'gess_method' is 'Cor', the query is a matrix with a single numeric column and the gene labels in the row name slot. The numeric column can contain z-scores, LFCs, (normalized) gene expression intensity values or read counts.

gess_method

one of 'CMAP', 'LINCS', 'gCMAP', 'Fisher' or 'Cor'

refdb

character(1), can be one of "cmap", "cmap_expr", "lincs", or "lincs_expr" when using the CMAP/LINCS databases from the affiliated signatureSearchData package. With 'cmap' the database contains signatures of LFC scores obtained from DEG analysis routines; with 'cmap_expr' normalized gene expression values; with 'lincs' z-scores obtained from the DEG analysis methods of the LINCS project; and with 'lincs_expr' normalized expression values.

To use a custom signature database, it should be the file path to the HDF5 file generated with the build_custom_db function, the HDF5 file needs to have the .h5 extension.

When the gess_method is set as 'gCMAP' or 'Fisher', it could also be the file path to the HDF5 file converted from the gmt file containing gene sets by using gmt2h5 function. For example, the gmt files could be from the MSigDB https://www.gsea-msigdb.org/gsea/msigdb/index.jsp or GSKB http://ge-lab.org/#/data.

Value

qSig object

See Also

build_custom_db, signatureSearchData, gmt2h5, qSig-class

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
db_path <- system.file("extdata", "sample_db.h5", 
                       package = "signatureSearch")
## Load sample_db as `SummarizedExperiment` object
library(SummarizedExperiment); library(HDF5Array)
sample_db <- SummarizedExperiment(HDF5Array(db_path, name="assay"))
rownames(sample_db) <- HDF5Array(db_path, name="rownames")
colnames(sample_db) <- HDF5Array(db_path, name="colnames")
## get "vorinostat__SKB__trt_cp" signature drawn from sample database
query_mat <- as.matrix(assay(sample_db[,"vorinostat__SKB__trt_cp"]))
query = as.numeric(query_mat); names(query) = rownames(query_mat)
upset <- head(names(query[order(-query)]), 150)
downset <- tail(names(query[order(-query)]), 150)
qsig_lincs <- qSig(query=list(upset=upset, downset=downset), 
                   gess_method="LINCS", refdb=db_path)
qsig_gcmap <- qSig(query=query_mat, gess_method="gCMAP", refdb=db_path)

signatureSearch documentation built on April 16, 2021, 6 p.m.