# mb.long: Multivariate Empirical Bayes Statistics for Longitudinal... In timecourse: Statistical Analysis for Developmental Microarray Time Course Data

## Description

Computes the \tilde{T}^2 statistics and/or the MB-statistics of differential expression for longitudinal replicated developmental microarray time course data by multivariate empirical Bayes shrinkage of gene-specific sample variance-covariance matrices towards a common matrix.

## Usage

 1 2 3 4 5 mb.long(object, method = c("1D", "paired", "2D"), type = c("none", "robust"), times, reps, prior.df = NULL, prior.COV = NULL, prior.eta = NULL, condition.grp = NULL, rep.grp = NULL, time.grp = NULL, one.sample = FALSE, ref = NULL, p = 0.02, out.t = FALSE, tuning = 1.345, HotellingT2.only=TRUE) 

## Arguments

 object Required. An object of class matrix, MAList, marrayNorm, or ExpressionSet containing log-ratios or log-values of expression for a series of microarrays. method a character string, "1D" for the one-sample case where genes of interest are those which change over time, "paired" for the one-sample case where genes of interest are those whose expected temporal profiles do not stay 0, for example, cDNA microarrays, or the paired two-sample case where genes of interest are those with different expected temporal profiles across 2 biological conditions, "2D" for the independent two-sample case where genes of interest are those with different expected temporal profiles across 2 biological conditions. The default is "1D". type a character string, indicating whether possible outliers should be down-weighted. times Required. A positive integer giving the number of time points. reps Required. A numeric vector or matrix corresponding to the sample sizes for all genes across different biological conditions, when biological conditions are sorted in ascending order. If a matrix, rows represent genes while columns represent biological conditions. prior.df an optional positive value giving the degrees of moderation. prior.COV an optional numeric matrix giving the common covariance matrix to which the gene-specific sample covariances are smoothed toward. prior.eta an optional numeric value giving the scale parameter for the covariance matrix for the expected time course profile. condition.grp a numeric or character vector with length equals to the number of arrays, assigning the biological condition group of each array. Required if method=2D. rep.grp an optional numeric or character vector with length equals to the number of arrays, assigning the replicate group of each array. time.grp an optional numeric vector with length equals to the number of arrays, assigning the time point group of each array. one.sample Is it a one-sample problem? Only specify this argument when method=paired. The default is FALSE which means it is a paired two-sample problem. ref an optional numeric value or character specifying the name of reference biological condition. The default uses the first element of condition.grp. Only specify this argument when method=paired and one.sample is FALSE. p a numeric value between 0 and 1, assumed proportion of genes which are differentially expressed. out.t logical. Should the moderated multivariate t-statistics be outputed? The default is FALSE. tuning the tuning constant for the Huber weight function with a default 1.345. HotellingT2.only logical. Should only the HotellingT2 statistics be outputed? This should be set as TRUE (default) when the sample size(s) are the same across genes, in order to reduce computational time.

## Details

This function implements the multivariate empirical Bayes statistics described in Tai and Speed (2004), to rank genes in the order of interest from longitudinal replicated developmental microarray time course experiments. It calls one of the following functions, depending on which method is used: mb.1D, mb.paired, and mb.2D.

The arguments condition.grp, rep.grp, and time.grp, if specified, should have lengths equal to the number of arrays. The i_th elements of these three arguments should correspond to the biological condition, replicate, and time for the i_th column (array) in the expression value matrix of the input object, respectively. The default assumes the columns of M are in the ascending order of condition.grp first, and then rep.grp, and finally time.grp.

Arguments one.sample and ref are for method=paired only.

When type=robust, the numerator of the \tilde{T}^2 statistic is calculated using the weighted average time course vector(s), where the weight at each data point is determined using Huber's weight function with the default tuning constant 1.345.

Warning: When there are only 2 replicates within conditions, type="robust" produces the same rankings as type="none" since there is no consensus on gene expression values. Check the output weights for these outliers.

## Value

Object of MArrayTC.

## Author(s)

Yu Chuan Tai [email protected]keley.edu

## References

Yu Chuan Tai and Terence P. Speed (2006). A multivariate empirical Bayes statistic for replicated microarray time course data. Annals of Statistics 34(5):2387-2412.

Yu Chuan Tai and Terence P. Speed (2005). Statistical analysis of microarray time course data. In: DNA Microarrays, U. Nuber (ed.), BIOS Scientific Publishers Limited, Taylor & Francis, 4 Park Square, Milton Park, Abingdon OX14 4RN, Chapter 20.

P. J. Huber (2004). Robust Statistics. Wiley series in probability and mathematical statistics.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 data(fruitfly) colnames(fruitfly) ## check if arrays are arranged in the default order gnames <- rownames(fruitfly) assay <- rep(c("A", "B", "C"), each = 12) time.grp <- rep(c(1:12), 3) size <- rep(3, nrow(fruitfly)) out1 <- mb.long(fruitfly, times=12, reps=size, rep.grp = assay, time.grp = time.grp) summary(out1) plotProfile(out1, type="b", gnames=gnames, legloc=c(2,15), pch=c("A","B","C"), xlab="Hour") ## Simulate gene expression data ## Note: this simulation is for demonstration purpose only, ## and does not necessarily reflect the real ## features of longitudinal time course data ## one biological condition, 5 time points, 3 replicates ## 500 genes, 10 genes change over time SS <- matrix(c( 0.01, -0.0008, -0.003, 0.007, 0.002, -0.0008, 0.02, 0.002, -0.0004, -0.001, -0.003, 0.002, 0.03, -0.0054, -0.009, 0.007, -0.0004, -0.00538, 0.02, 0.0008, 0.002, -0.001, -0.009, 0.0008, 0.07), ncol=5) sim.Sigma <- function() { S <- matrix(rep(0,25),ncol=5) x <- mvrnorm(n=10, mu=rep(0,5), Sigma=10*SS) for(i in 1:10) S <- S+crossprod(t(x[i,])) solve(S) } sim.data1 <- function(x, indx=1) { mu <- rep(runif(1,8,x[1]),5) if(indx==1) res <- as.numeric(t(mvrnorm(n=3, mu=mu+rnorm(5,sd=4), Sigma=sim.Sigma()))) if(indx==0) res <- as.numeric(t(mvrnorm(n=3, mu=mu, Sigma=sim.Sigma()))) res } M1 <- matrix(rep(14,500*15), ncol=15) M1[1:10,] <- t(apply(M1[1:10,],1,sim.data1)) M1[11:500,] <- t(apply(M1[11:500,],1,sim.data1, 0)) ## Which genes are nonconstant? MB.1D1 <- mb.long(M1, times=5, reps=rep(3, 500)) MB.1D1$percent # check the percent of moderation plotProfile(MB.1D1,type="b") # plots the no. 1 gene plotProfile(MB.1D1,type="b",ranking=10) # plots the no. 10 gene genenames <- as.character(1:500) plotProfile(MB.1D1, type="b", gid="8", gnames=genenames) #plots the gene with ID "8" ## MB.1D1.r <- mb.long(M1, type="r", times=5, reps=rep(3, 500)) plotProfile(MB.1D1.r,type="b",gnames=genenames) plotProfile(MB.1D1.r,type="b", gid="1", gnames=genenames) #plots the gene with ID "1" ## assign the following labellings to columns of M1 ## which is actually the same as the default ## Not Run trt <- rep("wildtype", 15) assay <- rep(c("A","B","C"), rep(5,3)) time.grp <- rep(c(0, 1, 3, 4, 6), 3) ## MB.1D2 should give the same results as MB.1D1 #MB.1D2 <- mb.long(M1, times=5, reps=rep(3, 500), condition.grp = trt, rep.grp = assay, #time.grp=time.grp) ## suppose now the replicates are in this order instead assay <- rep(c("A","C","B"), rep(5,3)) ## then MB.1D3 <- mb.long(M1, times=5, reps=rep(3, 500), condition.grp = trt, rep.grp = assay, time.grp=time.grp) MB.1D3$rep.group #check the replicate and time group MB.1D3$time.group ## Now let's simulate another dataset with two biological conditions ## 500 genes also, 10 of them have different expected time course profiles ## between these two biological conditions ## 3 replicates, 5 time points for each condition sim.data2 <- function(x, indx=1) { mu <- rep(runif(1,8,x[1]),5) if(indx==1) res <- c(as.numeric(t(mvrnorm(n=3, mu=mu+rnorm(5,sd=5), Sigma=sim.Sigma()))), as.numeric(t(mvrnorm(n=3, mu=mu+rnorm(5,sd=3.2), Sigma=sim.Sigma())))) if(indx==0) res <- as.numeric(t(mvrnorm(n=6, mu=mu+rnorm(5,sd=3), Sigma=sim.Sigma()))) res } M2 <- matrix(rep(14,500*30), ncol=30) M2[1:10,] <- t(apply(M2[1:10,],1,sim.data2)) M2[11:500,] <- t(apply(M2[11:500,],1,sim.data2, 0)) ## assume it is a paired two-sample problem trt <- rep(c("wt","mt"),each=15) assay <- rep(rep(c("1.2.04","2.4.04","3.5.04"),each=5),2) size <- matrix(3, nrow=500, ncol=2) MB.paired <- mb.long(M2, method="paired", times=5, reps=size, condition.grp=trt, rep.grp=assay) MB.paired$con.group # check the condition, replicate and time groups MB.paired$rep.group MB.paired$time.group plotProfile(MB.paired, type="b") genenames <- as.character(1:500) plotProfile(MB.paired, gid="12", type="b", gnames=genenames) #plots the gene with ID "12" ### assume it is a unpaired two-sample problem assay <- rep(c("1.2.04","2.4.04","3.5.04","5.21.04","7.17.04","8.4.04"),each=5) MB.2D <- mb.long(M2, method="2", times=5, reps=size, condition.grp=trt, rep.grp=assay) MB.2D$con.group # check the condition, replicate and time groups MB.2D$rep.group MB.2D$time.group plotProfile(MB.2D,type="b", gnames=genenames) # plot the no. 1 gene ## Now let's simulate another dataset with two biological conditions ## 500 genes also, 10 of them have different expected time course profiles ## between these two biological conditions ## the first condition has 3 replicates, while the second condition has 4 replicates, ## 5 time points for each condition sim.data3 <- function(x, indx=1) { mu <- rep(runif(1,8,x[1]),5) if(indx==1) res <- c(as.numeric(t(mvrnorm(n=3, mu=mu+rnorm(5,sd=5), Sigma=sim.Sigma()))), as.numeric(t(mvrnorm(n=4, mu=mu+rnorm(5,sd=3.2), Sigma=sim.Sigma())))) if(indx==0) res <- as.numeric(t(mvrnorm(n=7, mu=mu+rnorm(5,sd=3), Sigma=sim.Sigma()))) res } M3 <- matrix(rep(14,500*35), ncol=35) M3[1:10,] <- t(apply(M3[1:10,],1,sim.data3)) M3[11:500,] <- t(apply(M3[11:500,],1,sim.data3, 0)) assay <- rep(c("1.2.04","2.4.04","3.5.04","5.21.04","7.17.04","9.10.04","12.1.04"),each=5) trt <- c(rep(c("wildtype","mutant"),each=15),rep("mutant",5)) ## Note that "mutant" < "wildtype", the sample sizes are (4, 3) size <- matrix(c(4,3), nrow=500, ncol=2, byrow=TRUE) MB.2D.2 <- mb.long(M3, method="2", times=5, reps=size, rep.grp=assay, condition.grp=trt) MB.2D.2$con.group # check the condition, replicate and time groups MB.2D.2$rep.group MB.2D.2$time.group plotProfile(MB.2D.2, type="b") # plot the no. 1 gene