A yeast cell-cycle gene expression data set collected in the CDC15 experiment of Spellman et al. (1998) where genome-wide mRNA levels of 6178 yeast open reading frames (ORFs) in a two cell-cycle period were measured at M/G1-G1-S-G2-M stages. However, to better understand the phenomenon underlying cell-cycle process, it is important to identify transcription factors (TFs) that regulate the gene expression levels of cell cycle-regulated genes. In this study, we presented a subset of 283 cell-cycled-regularized genes observed over 4 time points at G1 stage and the standardized binding probabilities of a total of 96 TFs obtained from a mixture model approach of Wang et al. (2007) based on the ChIP data of Lee et al. (2002).

1 | ```
data("yeastG1")
``` |

A data frame with 1132 observations (283 cell-cycled-regularized genes observed over 4 time points) with 99 variables (e.g., id, y, time, and 96 TFs).

Lee, T.I., Rinaldi, N.J., Robert, F., Odom, D.T., Bar-Joseph, Z., Gerber, G.K., Hannett, N.M.,
Harbison, C.T., Thompson, C.M., Simon, I., et al. (2002). Transcriptional regulatory networks in
Saccharomyces cerevisiae. *Science*, **298**, 799–804.

Spellman, P.T., Sherlock, G., Zhang, M.Q., Iyer, V.R., Anders, K., Eisen, M.B., Brown, P.O.,
Botstein, D., and Futcher, B. (1998). Comprehensive identification of cell cycle regulated genes of the yeast
Saccharomyces cerevisiae by microarray hybridization. *Molecular Biology of Cell*, **9**,
3273–3297.

Wang, L., Chen, G., and Li, H. (2007). Group SCAD regression analysis for microarray time course
gene expression data. *Bioinformatics*, **23**, 1486–1494.

Wang, L., Zhou, J., and Qu, A. (2012). Penalized generalized estimating equations
for high-dimensional longitudinal data anaysis. *Biometrics*, **68**, 353–360.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | ```
## Not run:
library(PGEE)
# load data
data(yeastG1)
data <- yeastG1
# get the column names
colnames(data)[1:9]
# see some portion of yeast G1 data
head(data,5)[1:9]
# define the input arguments
formula <- "y ~.-id"
family <- gaussian(link = "identity")
lambda.vec <- seq(0.01,0.2,0.01)
# find the optimum lambda
cv <- CVfit(formula = formula, id = data[,1], data = data, family = family, scale.fix = TRUE,
scale.value = 1, fold = 4, lambda.vec = lambda.vec, pindex = c(1,2), eps = 10^-6,
maxiter = 30, tol = 10^-6)
# print the results
print(cv)
# see the returned values by CVfit
names(cv)
# get the optimum lambda
cv$lam.opt
#fit the PGEE model
myfit1 <- PGEE(formula = formula, id = data[,1], data = data, na.action = NULL,
family = family, corstr = "independence", Mv = NULL,
beta_int = c(rep(0,dim(data)[2]-1)), R = NULL, scale.fix = TRUE,
scale.value = 1, lambda = cv$lam.opt, pindex = c(1,2), eps = 10^-6,
maxiter = 30, tol = 10^-6, silent = FALSE)
# get the values returned by myfit object
names(myfit1)
# get the values returned by summary(myfit) object
names(summary(myfit1))
# see a portion of the results returned by summary(myfit1)
# $coefficients
head(summary(myfit1)$coefficients,7)
# see the variables which have non-zero coefficients
index1 <- which(abs(summary(myfit1)$coef[,"Estimate"]) > 10^-3)
index1
# see the PGEE summary statistics of these non-zero variables
summary(myfit1)$coef[index1,]
# fit the GEE model
myfit2 <- MGEE(formula = formula, id = data[,1], data = data, na.action = NULL,
family = family, corstr = "independence", Mv = NULL,
beta_int = c(rep(0,dim(data)[2]-1)), R = NULL, scale.fix = TRUE,
scale.value = 1, maxiter = 30, tol = 10^-6, silent = FALSE)
# get the GEE summary statistics of the variables that turned out to be
#non-zero in PGEE analysis
summary(myfit2)$coef[index1,]
# see the significantly associated TFs in PGEE analysis
which(abs(summary(myfit1)$coef[index1,][,"Robust z"]) > 1.96)
# see the significantly associated TFs in both PGEE and GEE analyses
which(abs(summary(myfit2)$coef[index1,][,"Robust z"]) > 1.96)
## End(Not run)
``` |

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.