cc03-0-MultiLinearModel-class: Class "MultiLinearModel"

Description Usage Arguments Value Creating Objects Multiple linear models with "ExpressionSet" objects Slots Methods Details Author(s) See Also Examples

Description

Class to fit multiple (row-by-row) linear (fixed-effects) models on microarray or proteomics data.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
MultiLinearModel(form, clindata, arraydata)
## S4 method for signature 'MultiLinearModel'
summary(object, ...)
## S4 method for signature 'MultiLinearModel'
hist(x, xlab='F Statistics', main=NULL, ...)
## S4 method for signature 'MultiLinearModel,missing'
plot(x, y, ylab='F Statistics', ...)
## S4 method for signature 'MultiLinearModel,ANY'
plot(x, y, xlab='F Statistics',
 ylab=deparse(substitute(y)), ...)
## S4 method for signature 'MultiLinearModel'
anova(object, ob2, ...)
multiTukey(object, alpha)

Arguments

form

formula object specifying the linear model

clindata

either a data frame of "clinical" or other covariates, or an ExpressionSet.

arraydata

matrix or data frame of values to be explained by the model. If clindata is an ExpressionSet, then arraydata can be omitted, since it is assumed to be part of the ExpressionSet.

object

object of class MultiLinearModel

ob2

object of class MultiLinearModel

x

object of class MultiLinearModel

y

optional numeric vector

xlab

character string specifying label for the x-axis

ylab

character string specifying label for the y-axis

main

character string specifying graph title

...

extra arguments for generic or plotting functions

alpha

numeric scalar between 0 and 1 specifying the significance level for the Tukey test.

Value

The anova method returns a data frame. The rows in the data frame corresponds to the rows in the arraydata object that was used to construct the MultiLinearModel objects. The first column contains the F-statistics and the second column contains the p-values.

The multiTukey function returns a vector whose length equals the number of rows in the arraydata object used to construct the MultiLinearModel. Assuming that the overall F-test was significant, differences in group means (in each data row) larger than this value are significant by Tukey's test for honestly significant difference. (Of course, that statement is incorrect, since we haven't fully corrected for multiple testing. Our standard practice is to take the p-values from the row-by-row F-tests and evaluate them using the beta-uniform mixture model (see Bum). For the rows that correspond to models whose p-values are smaller than the Bum cutoff, we simply use the Tukey HSD values without further modification.)

Creating Objects

Objects should be created by calling the MultiLinearModel function. The first argument is a formula specifying the linear model, in the same manner that it would be passed to lm. We will fit the linear model separately for each row in the arraydata matrix. Rows of arraydata are attached to the clindata data frame and are always referred to as "Y" in the formulas. In particular, this implies that clindata can not include a column already called "Y". Further, the implementation only works if "Y" is the response variable in the model.

Multiple linear models with "ExpressionSet" objects

The BioConductor packages uses an ExpressionSet to combine microarray data and clinical covariates (known in their context as phenoData objects) into a single structure. You can call MultiLinearModel using an ExpressionSet object for the clindata argument. In this case, the function extracts the phenoData slot of the ExpressionSet to use for the clinical covariates, and extracts the exprs slot of the ExpressionSet object to use for the array data.

Slots

call:

A call object describing how the object was constructed.

model:

The formula object specifying the linear model.

F.statistics:

A numeric vector of F-statistics comparing the linear model to the null model.

p.values:

A numeric vector containing the p-values associated to the F-statistics.

coefficients:

A matrix of the coefficients in the linear models.

predictions:

A matrix of the (Y-hat) values predicted by the models.

sse:

A numeric vector of the sum of squared error terms from fitting the models.

ssr:

A numeric vector of the sum of squared regression terms from the model.

df:

A numeric vector (of length two) containing the degrees of freedom for the F-tests.

Methods

summary(object, ...)

Write out a summary of the object.

hist(x, xlab='F Statistics', main=NULL, ...)

Create a histogram of the F-statistics.

plot(x, ylab='F Statistics', ...)

Plot the F-statistics as a function of the row index.

plot(x, y, xlab='F Statistics', ylab=deparse(substitute(y)), ...)

Plot the F-statistics against the numeric vector y.

anova(object, ob2, ...)

Perform row-by-row F-tests comparing two different linear models.

Details

The MultiLinearModel constructor computes row-by-row F-tests comparing each linear model to the null model Y ~ 1. In many instances, one wishes to use an F-test to compare two different linear models. For instance, many standard applications of analysis of variance (ANOVA) can be described using such a comparison between two different linear models. The anova method for the MultiLinearModel class performs row-by-row F-tests comparing two competing linear models.

The implementation of MultiLinearModel does not take the naive approach of using either apply or a for-loop to attach rows one at a time and fit separate linear models. All the models are actually fit simultaneously by a series of matrix operations, which greatly reduces the amount of time needed to compute the models. The constraint on the column names in clindata still holds, since one row is attached to allow model.matrix to determine the contrasts matrix.

Author(s)

Kevin R. Coombes krc@silicovore.com

See Also

anova, lm, Bum, MultiTtest, MultiWilcoxonTest

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
showClass("MultiLinearModel")
ng <- 10000
ns <- 50
dat <- matrix(rnorm(ng*ns), ncol=ns)
cla <- factor(rep(c('A', 'B'), 25))
cla2 <- factor(rep(c('X', 'Y', 'Z'), times=c(15, 20, 15)))
covars <- data.frame(Grade=cla, Stage=cla2)
res <- MultiLinearModel(Y ~ Grade + Stage, covars, dat)
summary(res)
hist(res, breaks=101)
plot(res)
plot(res, res@p.values)

graded <- MultiLinearModel(Y ~ Grade, covars, dat)
summary(graded)

hist(graded@p.values, breaks=101)
hist(res@p.values, breaks=101)

oop <- anova(res, graded)
hist(oop$p.values, breaks=101)

Example output

Loading required package: oompaBase
Class "MultiLinearModel" [package "ClassComparison"]

Slots:
                                                                       
Name:          call        model F.statistics     p.values coefficients
Class:         call      formula      numeric      numeric       matrix
                                                          
Name:   predictions          sse          ssr           df
Class:       matrix      numeric      numeric      numeric
Row-by-row linear models with 10000 rows

Call: MultiLinearModel(form = Y ~ Grade + Stage, clindata = covars, arraydata = dat) 

F-statistics:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
0.001182 0.409366 0.799249 1.048185 1.418960 9.435995 

P-values:
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0000568 0.2493306 0.5006691 0.4981831 0.7470104 0.9999430 
Row-by-row linear models with 10000 rows

Call: MultiLinearModel(form = Y ~ Grade, clindata = covars, arraydata = dat) 

F-statistics:
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
 0.00000  0.09921  0.45885  1.03144  1.35560 18.63245 

P-values:
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
0.0000789 0.2500567 0.5014157 0.5021233 0.7541404 0.9999222 

ClassComparison documentation built on May 6, 2019, 5:02 p.m.