Significance Analysis of Microarray

Description

Performs a Significance Analysis of Microarrays (SAM). It is possible to perform one and two class analyses using either a modified t-statistic or a (standardized) Wilcoxon rank statistic, and a multiclass analysis using a modified F-statistic. Moreover, this function provides a SAM procedure for categorical data such as SNP data and the possibility to employ an user-written score function.

Usage

1
2
  sam(data, cl, method = d.stat, control=samControl(),
      gene.names = dimnames(data)[[1]], ...)

Arguments

data

a matrix, a data frame, or an ExpressionSet object. Each row of data (or exprs(data), respectively) must correspond to a variable (e.g., a gene), and each column to a sample (i.e.\ an observation).

Can also be a list (if method = chisq.stat or method = trend.stat). For details on how to specify data in this case, see chisq.stat.

cl

a vector of length ncol(data) containing the class labels of the samples. In the two class paired case, cl can also be a matrix with ncol(data) rows and 2 columns. If data is an ExpressionSet object, cl can also be a character string naming the column of pData(data) that contains the class labels of the samples. If data is a list, cl needs not to be specified.

In the one-class case, cl should be a vector of 1's.

In the two class unpaired case, cl should be a vector containing 0's (specifying the samples of, e.g., the control group) and 1's (specifying, e.g., the case group).

In the two class paired case, cl can be either a numeric vector or a numeric matrix. If it is a vector, then cl has to consist of the integers between -1 and -n/2 (e.g., before treatment group) and between 1 and n/2 (e.g., after treatment group), where n is the length of cl and k is paired with -k, k=1,…,n/2. If cl is a matrix, one column should contain -1's and 1's specifying, e.g., the before and the after treatment samples, respectively, and the other column should contain integer between 1 and n/2 specifying the n/2 pairs of observations.

In the multiclass case and if method = chisq.stat, cl should be a vector containing integers between 1 and g, where g is the number of groups. (In the case of chisq.stat, cl needs not to be specified if data is a list of groupwise matrices.)

For examples of how cl can be specified, see the manual of siggenes.

method

a character string or a name specifying the method/function that should be used in the computation of the expression scores d.

If method = d.stat, a modified t-statistic or F-statistic, respectively, will be computed as proposed by Tusher et al. (2001).

If method = wilc.stat, a Wilcoxon rank sum statistic or Wilcoxon signed rank statistic will be used as expression score.

For an analysis of categorical data such as SNP data, method can be set to chisq.stat. In this case Pearson's ChiSquare statistic is computed for each row.

If the variables are ordinal and a trend test should be applied (e.g., in the two-class case, the Cochran-Armitage trend test), method = trend.stat can be employed.

It is also possible to use an user-written function to compute the expression scores. For details, see Details.

control

further optional arguments for controlling the SAM analysis. For these arguments, see samControl.

gene.names

a character vector of length nrow(data) containing the names of the genes. By default the row names of data are used.

...

further arguments of the specific SAM methods. If method = d.stat, see the help of d.stat. If method = wilc.stat, see the help of wilc.stat. If method = chisq.stat, see the help of chisq.stat.

Details

sam provides SAM procedures for several types of analysis (one and two class analyses with either a modified t-statistic or a Wilcoxon rank statistic, a multiclass analysis with a modified F statistic, and an analysis of categorical data). It is, however, also possible to write your own function for another type of analysis. The required arguments of this function must be data and cl. This function can also have other arguments. The output of this function must be a list containing the following objects:

d:

a numeric vector consisting of the expression scores of the genes.

d.bar:

a numeric vector of the same length as na.exclude(d) specifying the expected expression scores under the null hypothesis.

p.value:

a numeric vector of the same length as d containing the raw, unadjusted p-values of the genes.

vec.false:

a numeric vector of the same length as d consisting of the one-sided numbers of falsely called genes, i.e. if d > 0 the numbers of genes expected to be larger than d under the null hypothesis, and if d<0, the number of genes expected to be smaller than d under the null hypothesis.

s:

a numeric vector of the same length as d containing the standard deviations of the genes. If no standard deviation can be calculated, set s = numeric(0).

s0:

a numeric value specifying the fudge factor. If no fudge factor is calculated, set s0 = numeric(0).

mat.samp:

a matrix with B rows and ncol(data) columns, where B is the number of permutations, containing the permutations used in the computation of the permuted d-values. If such a matrix is not computed, set mat.samp = matrix(numeric(0)).

msg:

a character string or vector containing information about, e.g., which type of analysis has been performed. msg is printed when the function print or summary, respectively, is called. If no such message should be printed, set msg = "".

fold:

a numeric vector of the same length as d consisting of the fold changes of the genes. If no fold change has been computed, set fold = numeric(0).

If this function is, e.g., called foo, it can be used by setting method = foo in sam. More detailed information and an example will be contained in the siggenes manual.

Value

An object of class SAM.

Author(s)

Holger Schwender, holger.schw@gmx.de

References

Schwender, H., Krause, A., and Ickstadt, K. (2006). Identifying Interesting Genes with siggenes. RNews, 6(5), 45-50.

Schwender, H. (2004). Modifying Microarray Analysis Methods for Categorical Data – SAM and PAM for SNPs. To appear in: Proceedings of the the 28th Annual Conference of the GfKl.

Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.

See Also

SAM-class,d.stat,wilc.stat, chisq.stat, samControl

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
## Not run: 
  # Load the package multtest and the data of Golub et al. (1999)
  # contained in multtest.
  library(multtest)
  data(golub)
  
  # golub.cl contains the class labels.
  golub.cl

  # Perform a SAM analysis for the two class unpaired case assuming
  # unequal variances.
  sam.out <- sam(golub, golub.cl, B=100, rand=123)
  sam.out
  
  # Obtain the Delta plots for the default set of Deltas
  plot(sam.out)
  
  # Generate the Delta plots for Delta = 0.2, 0.4, 0.6, ..., 2
  plot(sam.out, seq(0.2, 0.4, 2))
  
  # Obtain the SAM plot for Delta = 2
  plot(sam.out, 2)
  
  # Get information about the genes called significant using 
  # Delta = 3.
  sam.sum3 <- summary(sam.out, 3, entrez=FALSE)
  
  # Obtain the rows of golub containing the genes called
  # differentially expressed
  sam.sum3@row.sig.genes
  
  # and their names
  golub.gnames[sam.sum3@row.sig.genes, 3] 

  # The matrix containing the d-values, q-values etc. of the
  # differentially expressed genes can be obtained by
  sam.sum3@mat.sig
  
  # Perform a SAM analysis using Wilcoxon rank sums
  sam(golub, golub.cl, method="wilc.stat", rand=123)
    

  # Now consider only the first ten columns of the Golub et al. (1999)
  # data set. For now, let's assume the first five columns were
  # before treatment measurements and the next five columns were
  # after treatment measurements, where column 1 and 6, column 2
  # and 7, ..., build a pair. In this case, the class labels
  # would be
  new.cl <- c(-(1:5), 1:5)
  new.cl
  
  # and the corresponding SAM analysis for the two-class paired
  # case would be
  sam(golub[,1:10], new.cl, B=100, rand=123)
  
  # Another way of specifying the class labels for the above paired
  # analysis is
  mat.cl <- matrix(c(rep(c(-1, 1), e=5), rep(1:5, 2)), 10)
  mat.cl
  
  # and the above SAM analysis can also be done by
  sam(golub[,1:10], mat.cl, B=100, rand=123)

## End(Not run)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.