Generation of a C matrix

Share:

Description

This function will construct a matrix of indicator variables for category membership from keyword or gene-indexed lists. Size constraints, the option to prune identical categories, and a vector of present genes can be defined to filter categories and order genes. New to version 3.0.0, annotation can be provided so that each gene, instead of each feature, has equal weight in a category.

Usage

1
2
3
4
5
  getCmatrix(keyword.list = NULL, gene.list = NULL, 
             present.genes = NULL, min.size = 2, max.size = Inf,  
             by.gene = FALSE, gene.names =  NULL, prefix = "",
             prune = FALSE, 
             as.matrix = FALSE, GO.ont = NULL, ...)

Arguments

keyword.list

A list containing character vectors for each keyword that specify the gene members.

gene.list

A list containing character vectors for each gene that specify the annotated functional categories.

present.genes

An optional vector used to filter genes in the C matrix. Can be provided as an unordered character vector of gene names that match names(list), or as an ordered vector of presence (1) and absence (0) calls.

min.size

Optional minimum category size to be considered.

max.size

Optional maximum category size to be considered.

by.gene

Optional logical to build 'soft' categories at the gene level, instead of the feature level.

gene.names

Optional character vector of gene names for 'soft' categories.

prefix

Optional character string to preceed category names.

prune

Optional logical to remove duplicate categories.

as.matrix

Optional argument to specify a matrix is returned rather than a matrix.csr.

GO.ont

"CC", "BP", or "MF" specify which Gene Ontology.

...

Any extra arguments will be forwarded to the read.table function when category assignments are given as a file.

Details

Typical usages are

1
2
3
  getCmatrix(keyword.list, present.genes)
  getCmatrix(gene.list, present.genes)
  

Value

C.mat.csr

If as.matrix=F a sparse matrix is returned with the rows corresponding to the genes and columns are categories

row.names

Character vector of gene names

col.names

Character vector of category names

col.synonym

Pipe-delimited Character vector of matching categories when prune=T

Author(s)

William T. Barry: bbarry@jimmy.harvard.edu

References

W. T. Barry, A. B. Nobel and F.A. Wright, 2005, Significance Analysis of functional categories in gene expression studies: a structured permutation approach, Bioinformatics 21(9) 1943-9.

See also the vignette included with this package.

See Also

safe, safeplot, getPImatrix.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
if(interactive()){
 require(hgu133a.db)
 genes <- unlist(as.list(hgu133aSYMBOL))
 RS.list <- list(Genes21 = c("ACTB","RPLP0","MYBL2","BIRC5","BAG1",
                             "GUSB","CD68","BCL2","MMP11","AURKA",
                             "GSTM1","ESR1","TFRC","PGR","CTSL2",
                             "GRB7","ERBB2","MKI67","GAPDH","CCNB1",
                             "SCUBE2"),
                 Genes16 = c("MYBL2","BIRC5","BAG1","CD68","BCL2",
                             "MMP11","AURKA","GSTM1","ESR1","PGR","CTSL2",
                             "GRB7","ERBB2","MKI67","CCNB1","SCUBE2"))
 RS.list <- lapply(RS.list,function(x) return(names(genes[which( match(genes, x, nomatch = 0) > 0)])))
 C1 <- getCmatrix(keyword.list = RS.list)
}