safe: Significance Analysis of Function and Expression
In safe: Significance Analysis of Function and Expression

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/safe.R

Performs a significance analysis of function and expression (SAFE) for a gene expression experiment and a set of functional categories specified by the user. SAFE is a two-stage resampling-based method that can be applied to a 2-sample, paired, multi-class, simple linear and right-censored linear regression models. Other experimental designs can also be accommodated through user-defined functions.

safe(X.mat, y.vec, C.mat = NULL, Z.mat = NULL,
     method = "permutation", platform = NULL, 
     annotate = NULL, min.size = 2, max.size = Inf, 
     by.gene = FALSE, local = "default", global = "default", 
     args.local = NULL, args.global = list(one.sided = FALSE), 
     Pi.mat = NULL, error = "FDR.BH", parallel=FALSE, alpha = NA, 
     epsilon = 10^(-10), print.it = TRUE, ...)

`X.mat`	A matrix or data.frame of expression data of size m by n where each row corresponds to a gene feature and each column to a sample. Data should be properly normalized and cannot contain missing values.
`y.vec`	A numeric, integer or character vector of length n containing the response of interest. For examples of the acceptable forms `y.vec` can take, see the vignette.
`C.mat`	A matrix containing the gene category assignments. Each column represents a category and should be named accordingly. For each column, values of 1 (`TRUE`) and 0 (`FALSE`) indicate whether the features in the corresponding rows of `X.mat` are contained in the category. This can also be a list containing a sparse matrix and names as created by `getCmatrix`.
`Z.mat`	A data.frame of size n by p, with p covariates as numeric or factors.
`method`	Type of hypothesis test can be specified as "permutation", "bootstrap.t", and "bootstrap.q". "express" calls the dependent package `safeExpress`. See vignette for details.
`platform`	If `C.mat` is unspecified, a character string of a Bioconductor annotation package can be used to build gene categories. See vignette for details and examples.
`annotate`	If `C.mat` is unspecified, a character string to specify the type of gene categories to build from annotation packages. "GO.MF", "GO.BP", "GO.CC", and "GO.ALL" (default) specify one or all Gene Ontologies. "KEGG" specifies pathways, and "PFAM" homologous families from the respective sources.
`min.size`	Optional minimum category size in building `C.mat`.
`max.size`	Optional maximum category size in building `C.mat`.
`by.gene`	Logical argument (default = `FALSE`) specifying whether multiple features to a single gene should be down-weighted.
`local`	Specifies the gene-specific statistic from the following options: "t.Student", "t.Welch", and "t.paired", for 2-sample designs, "f.ANOVA" for 1-way ANOVAs, and "t.LM" for simple linear regressions. "default" will choose between "t.Student", "f.ANOVA", and "t.LM" based on the form of `y.vec`. User-defined local statistics can also be used; details are provided in the vignette.
`global`	Specifies the global statistic for a gene categories. By default, the Wilcoxon rank sum ("Wilcoxon") is used. Else, a Fisher's Exact test statistic ("Fisher"), a Pearson's chi-squared type statistic ("Pearson") or t-statistic for average difference ("AveDiff") is available. User-defined global statistics can also be implemented.
`args.local`	An optional list to be passed to user-defined local statistics that require additional arguments. By default `args.local = NULL`.
`args.global`	An optional list to be passed to global statistics that require additional arguments. For two-sided local statistics, `args.global` = list(one.sided=F) allows bi-directional differential expression to be considered.
`Pi.mat`	Either an integer, or a matrix or data.frame containing the permutations. See `getPImatrix` for the acceptable form of a matrix or data.frame. If `Pi.mat` is an integer, B, then `safe` will generate B resamples of `X.mat`.
`error`	Specifies the method for computing error rate estimates. By default, Benjamini-Hochberg step down ("FDR.BH") FDR estimates are computed. A Bonferroni ("FWER.Bonf") and Holm's step-up ("FWER.Holm") adjustment can also be specified. Under permutation, "FDR.YB" computes the Yekutieli-Benjamini FDR estimate, and "FWER.WY" computes the Westfall-Young FWER estimate. The user can also specify "none" if no error rates are desired.
`parallel`	Logical argument (default = `FALSE`) specifying whether hypothesis test of `method` should be conducted with parallel processing. Only compatible with `error = "none", "FWER.Bonf",` or `FDR.BH`. See vignette for details.
`alpha`	The threshold for significant results to return. By default, alpha will be 0.05 for nominal p-values (`error` = "none" ), and 0.1 for adjusted p-values.
`epsilon`	Numeric argument sets the minimum difference for ranking local and global statistics, correcting a numerical precision issue when computing empirical p-values in small data sets (n < 15). The default value is 10^(-10).
`print.it`	Logical argument (default = `TRUE`) specifying whether to print progress updates to the log for permutation and bootstrap calculations.
`...`	Allows arguments from version 2.0 to be ignored.

safe utilizes a general framework for testing differential expression across gene categories that allows it to be used in various experimental designs. Through structured resampling of the data, safe accounts for the unknown correlation among genes, and enables proper estimation of error rates when testing multiple categories. safe also provides statistics and empirical p-values for the gene-specific differential expression.

The function returns an object of class SAFE. See help for SAFE-class for more details.

William T. Barry: bbarry@jimmy.harvard.edu

W. T. Barry, A. B. Nobel and F.A. Wright, 2005, Significance Analysis of functional categories in gene expression studies: a structured permutation approach, Bioinformatics 21(9) 1943–1949.

See also the vignette included with this package.

safeplot, safe.toptable, gene.results, getCmatrix, getPImatrix.

## Simulate a dataset with 1000 genes and 20 arrays in a 2-sample design.
## The top 100 genes will be differentially expressed at varying levels

g.alt <- 100
g.null <- 900
n <- 20

data<-matrix(rnorm(n*(g.alt+g.null)),g.alt+g.null,n)
data[1:g.alt,1:(n/2)] <- data[1:g.alt,1:(n/2)] + 
                         seq(2,2/g.alt,length=g.alt)
dimnames(data) <- list(c(paste("Alt",1:g.alt),
                         paste("Null",1:g.null)),
                       paste("Array",1:n))

## A treatment vector 
trt <- rep(c("Trt","Ctr"),each=n/2)

## 2 alt. categories and 18 null categories of size 50

C.matrix <- kronecker(diag(20),rep(1,50))
dimnames(C.matrix) <- list(dimnames(data)[[1]],
    c(paste("TrueCat",1:2),paste("NullCat",1:18)))
dim(C.matrix)

results <- safe(data,trt,C.mat = C.matrix,Pi.mat = 100)
results

## SAFE-plot made for the first category
if (interactive()) { 
safeplot(results,"TrueCat 1")
}

Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

[1] 1000   20
Warning: y.vec is not (0,1), thus Group 1 == Trt 
100 permutations completed
SAFE results:
  Local: t.Student 
  Global: Wilcoxon 
  Method: permutation 
  Error: FDR.BH 

No categories were significant at FDR.BH < 0.05