ebam: Empirical Bayes Analysis of Microarrays
In siggenes: Multiple Testing using SAM and Efron's Empirical Bayes Approaches

Description Usage Arguments Value Author(s) References See Also Examples

Performs an Empirical Bayes Analysis of Microarrays (EBAM). It is possible to perform one and two class analyses using either a modified t-statistic or a (standardized) Wilcoxon rank statistic, and a multiclass analysis using a modified F-statistic. Moreover, this function provides a EBAM procedure for categorical data such as SNP data and the possibility to employ an user-written score function.

1
2
3

  ebam(x, cl, method = z.ebam, delta = 0.9, which.a0 = NULL, 
       control = ebamControl(), gene.names = dimnames(x)[[1]],
       ...)

`x`	either a matrix, a data frame or an ExpressionSet object, or the output of `find.a0`, i.e.\ an object of class `FindA0`. Can also be a list (if `method = chisq.ebam` or `method = trend.ebam`). For the latter case, see `chisq.ebam`. If `x` is not a FindA0 object, then each row of `x` (or `exprs(x)`, respectively) must correspond to a variable (e.g., a gene or a SNP), and each column to a sample.
`cl`	a specification of the class labels of the samples. Ignored if `x` is a `FindA0` object. Needs not to be specified if `x` is a list. Typically, `cl` is specified by a vector of length `ncol(x)`. In the two class paired case, `cl` can also be a matrix with `ncol(x)` rows and 2 columns. If `x` is an ExpressionSet object, `cl` can also be a character string naming the column of `pData(x)` that contains the class labels of the samples. In the one-class case, `cl` should be a vector of 1's. In the two class unpaired case, `cl` should be a vector containing 0's (specifying the samples of, e.g., the control group) and 1's (specifying, e.g., the case group). In the two class paired case, `cl` can be either a numeric vector or a numeric matrix. If it is a vector, then `cl` has to consist of the integers between -1 and -n/2 (e.g., before treatment group) and between 1 and n/2 (e.g., after treatment group), where n is the length of `cl` and k is paired with -k, k=1,…,n/2. If `cl` is a matrix, one column should contain -1's and 1's specifying, e.g., the before and the after treatment samples, respectively, and the other column should contain integer between 1 and n/2 specifying the n/2 pairs of observations. In the multiclass case and if `method = chisq.ebam` or `method = trend.ebam`, `cl` should be a vector containing integers between 1 and g, where g is the number of groups. In the two latter cases, `cl` needs not to be specified, if `x` is a list. For details, see `chisq.ebam`. For examples of how `cl` can be specified, see the manual of siggenes.
`method`	a character string or name specifying the method or function that should be used in the computation of the expression score z. If `method = z.ebam`, a modified t- or F-statistic, respectively, will be computed as proposed by Efron et al. (2001). If `method = wilc.ebam`, a (standardized) Wilcoxon sum / signed rank statistic will be used as expression score. For an analysis of categorical data such as SNP data, `method` can be set to `chisq.ebam`. In this case, Pearson's Chi-squared statistic is computed for each row. If the variables are ordinal and a trend test should be applied (e.g., in the two-class case, the Cochran-Armitage trend test), `method = trend.ebam` can be employed. It is also possible to employ an user-written function for computing an user-specified expression score. For details, see the vignette of siggenes.
`delta`	a numeric vector consisting of probabilities for which the number of differentially expressed genes and the FDR should be computed, where a gene is called differentially expressed if its posterior probability is larger than Delta.
`which.a0`	an integer between 1 and the length of `quan.a0` of `find.a0`. If `NULL`, the suggested choice of `find.a0` is used. Ignored if `x` is a matrix, data frame or ExpressionSet object.
`control`	further arguments for controlling the EBAM analysis. For these arguments, see `ebamControl`.
`gene.names`	a vector of length `nrow(x)` specifying the names of the variables. By default, the row names of the matrix / data frame comprised by `x` are used.
`...`	further arguments of the specific EBAM methods. If `method = z.ebam`, see `z.ebam`. If `method = wilc.ebam`, see `wilc.ebam`. If `method = chisq.ebam`, see `chisq.ebam`.

An object of class EBAM.

Holger Schwender, holger.schw@gmx.de

Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment. JASA, 96, 1151-1160.

Schwender, H., Krause, A., and Ickstadt, K. (2006). Identifying Interesting Genes with siggenes. RNews, 6(5), 45-50.

Storey, J.D. and Tibshirani, R. (2003). Statistical Significance for Genome-Wide Studies. Proceedings of the National Academy of Sciences, 100, 9440-9445.

EBAM-class, find.a0, z.ebam, wilc.ebam, chisq.ebam

## Not run: 
  # Load the data of Golub et al. (1999) contained in the package multtest.
  data(golub)
  
  # golub.cl contains the class labels.
  golub.cl
  
  # Perform an EBAM analysis for the two class unpaired case assuming
  # unequal variances. Specify the fudge factor a0 by the suggested
  # choice of find.a0
  find.out <- find.a0(golub, golub.cl, rand = 123)
  ebam.out <- ebam(find.out)
  ebam.out
    
  # Since a0 = 0 leads to the largest number of genes (i.e. the suggested
  # choice of a0), the following leads to the same results as the above
  # analysis (but only if the random number generator, i.e. rand, is set
  # to the same number).
  ebam.out2 <- ebam(golub, golub.cl, a0 = 0, fast = TRUE, rand = 123)
  ebam.out2

  # If fast is set to TRUE in ebam, a crude estimate of the number of
  # falsely called genes is used (see the help file for z.ebam). This
  # estimate is always employed in find.a0. 
  # The exact number is used in ebam when performing
  ebam.out3 <- ebam(golub, golub.cl, a0 = 0, rand = 123)
  ebam.out3  

  # Since this is the recommended way, we use ebam.out3 at the end of
  # the Examples section for further analyses.



  # Perform an EBAM analysis for the two class unpaired case assuming
  # equal group variances. Set a0 = 0, and use B = 50 permutations
  # of the class labels.
  ebam.out4 <- ebam(golub, golub.cl, a0 = 0, var.equal = TRUE, B = 50,
     rand = 123)
  ebam.out4
    
  # Perform an EBAM analysis for the two class unpaired cased assuming
  # unequal group variances. Use the median (i.e. the 50% quantile)
  # of the standard deviations of the genes as fudge factor a0. And
  # obtain the number of genes and the FDR if a gene is called 
  # differentially when its posterior probability is larger than
  # 0.95.
  ebam.out5 <- ebam(golub, golub.cl, quan.a0 = 0.5, delta = 0.95,
     rand = 123)
  ebam.out5
    
  # For the third analysis, obtain the number of differentially
  # expressed genes and the FDR if a gene is called differentially
  # expressed if its posterior probability is larger than 0.8, 0.85,
  # 0.9, 0.95.
  print(ebam.out3, c(0.8, 0.85, 0.9, 0.95))
    
  # Generate a plot of the posterior probabilities for delta = 0.9.
  plot(ebam.out3, 0.9)
    
  # Obtain the list of genes called differentially expressed if their
  # posterior probability is larger than 0.99, and gene-specific 
  # statistics for these variables such as their z-value and their
  # local FDR.
  summary(ebam.out3, 0.99)

## End(Not run)