getCmatrix: Generation of a C matrix

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

This function will construct a matrix of indicator variables for category membership from keyword or gene-indexed lists. Size constraints, the option to prune identical categories, and a vector of present genes can be defined to filter categories and order genes. New to version 3.0.0, annotation can be provided so that each gene, instead of each feature, has equal weight in a category.

Usage

1
2
3
4
5
  getCmatrix(keyword.list = NULL, gene.list = NULL, 
             present.genes = NULL, min.size = 2, max.size = Inf,  
             by.gene = FALSE, gene.names =  NULL, prefix = "",
             prune = FALSE, 
             as.matrix = FALSE, GO.ont = NULL, ...)

Arguments

keyword.list

A list containing character vectors for each keyword that specify the gene members.

gene.list

A list containing character vectors for each gene that specify the annotated functional categories.

present.genes

An optional vector used to filter genes in the C matrix. Can be provided as an unordered character vector of gene names that match names(list), or as an ordered vector of presence (1) and absence (0) calls.

min.size

Optional minimum category size to be considered.

max.size

Optional maximum category size to be considered.

by.gene

Optional logical to build 'soft' categories at the gene level, instead of the feature level.

gene.names

Optional character vector of gene names for 'soft' categories.

prefix

Optional character string to preceed category names.

prune

Optional logical to remove duplicate categories.

as.matrix

Optional argument to specify a matrix is returned rather than a matrix.csr.

GO.ont

"CC", "BP", or "MF" specify which Gene Ontology.

...

Any extra arguments will be forwarded to the read.table function when category assignments are given as a file.

Details

Typical usages are

1
2
3
  getCmatrix(keyword.list, present.genes)
  getCmatrix(gene.list, present.genes)
  

Value

C.mat.csr

If as.matrix=F a sparse matrix is returned with the rows corresponding to the genes and columns are categories

row.names

Character vector of gene names

col.names

Character vector of category names

col.synonym

Pipe-delimited Character vector of matching categories when prune=T

Author(s)

William T. Barry: bbarry@jimmy.harvard.edu

References

W. T. Barry, A. B. Nobel and F.A. Wright, 2005, Significance Analysis of functional categories in gene expression studies: a structured permutation approach, Bioinformatics 21(9) 1943-9.

See also the vignette included with this package.

See Also

safe, safeplot, getPImatrix.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
if(interactive()){
 require(hgu133a.db)
 genes <- unlist(as.list(hgu133aSYMBOL))
 RS.list <- list(Genes21 = c("ACTB","RPLP0","MYBL2","BIRC5","BAG1",
                             "GUSB","CD68","BCL2","MMP11","AURKA",
                             "GSTM1","ESR1","TFRC","PGR","CTSL2",
                             "GRB7","ERBB2","MKI67","GAPDH","CCNB1",
                             "SCUBE2"),
                 Genes16 = c("MYBL2","BIRC5","BAG1","CD68","BCL2",
                             "MMP11","AURKA","GSTM1","ESR1","PGR","CTSL2",
                             "GRB7","ERBB2","MKI67","CCNB1","SCUBE2"))
 RS.list <- lapply(RS.list,function(x) return(names(genes[which( match(genes, x, nomatch = 0) > 0)])))
 C1 <- getCmatrix(keyword.list = RS.list)
}

Example output

Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

safe documentation built on Nov. 8, 2020, 5:37 p.m.