formatData: Format a microarray spreadsheet ready for interventional...

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/formatData.R

Description

This function formats a microarray timecourse dataset ready for the interventionalInference function.

Usage

1
2
formatData(d, cellLines = NULL, inhibitors = NULL, stimuli = NULL, times = NULL,
  nodes = NULL, intercept = TRUE, initialIntercept = TRUE, gradients = FALSE)

Arguments

d

A microarray spreadsheet, a samples by (4 + P) matrix, where P is the number of measurements for each sample.
Column 1 gives the cell line in each sample.
Column 2 gives the inhibitor used in each sample.
Column 3 gives the stimulus used in each sample.
Column 4 gives the time each sample was measured.

cellLines

A vector specifying a subset of cell lines to analyse (if absent, they are all used).

inhibitors

A vector specifying a subset of the inhibitors to analyse (if absent, they are all used).

stimuli

A vector specifying a subset of the stimuli to analyse (if absent, they are all used).

times

A vector specifying a subset of the times to analyse as the response (if absent, they are all used).

nodes

A vector specifying the indices of a subset of nodes to include in the analysis.
Further nodes can be removed from the response in the interventionalInferenceDBN function.

intercept

A logical value indicating whether an intercept parameter should be included in all models.

initialIntercept

A logical value indicating whether an intercept parameter should be used to estimate the level at the initial timepoint. Only used if the initial timepoint is in the response.

gradients

A logical value indicating whether the concentraion gradient should be used as the response instead of the raw concentrations. This model has parallels with a dynamical systems viewpoint, and requires the covariance matrix to be adjusted. See Sigma.

Details

The entries of column 4 of d must be real numbers. Missing values are acceptable and are handled as follows:

  1. Missing values in the response are ignored.

  2. For the predictors, if a single timepoint is missing, the predictors are interpolated from the two immediate neighbours.

  3. If one of the two immediate neighbours is missing then the response is ignored.

  4. UNLESS the predictor in question is for the initial observation (which is always missing), in which case 0 is returned, so that the level at zero can be estimated by a second intercept parameter in the interventionalInferenceDBN function.

Value

y

The n by P response matrix, where n is the number of observations in the response. Not necesarily the same as the number of samples.

X0

The n by a design matrix of predcitors to be included in all models. Usually the intercept and zero intercept (if present).

X1

The n by P design matrix of predictors to undergo model selection.

Sigma

The n by n covariance matrix for a single column of y (proportional to σ^2). The identity matrix, unless gradients is TRUE.

sampleInfo

An n by 4 matrix giving the cell line, inhibitor, stimulus and timepoint for each observation used in the response.

interpolated

A matrix similar to sampleInfo, giving the particulars of any observations for which the predictors were interpolated. Empty if no interpolation has been used.

cond

A vector indexing the experimental conditions, given by the cell line, inhibitor and stimulus used in each sample.

Author(s)

Simon Spencer

See Also

interventionalInference, interventionalInferenceAdvanced, interventionalDBN-package, interventionEffects

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
data(interventionalData)
# Load your own data spreadsheet using myData<-read.csv("myDataFile.csv").

# Use everything
fullData <- formatData(interventionalData)

# Use only DMSO and EGFRi samples.
halfData <- formatData(interventionalData,inhibitors=c("DMSO","EGFRi"))

# Produce gradients as response
diffData <- formatData(interventionalData,gradients=TRUE,initialIntercept=FALSE)
# Different results if we use the time between observations, rather than the timepoint.
interventionalData[,4]<-rep(c(0,5,10,20,30,60,90,120),4)
diffData2 <- formatData(interventionalData,gradients=TRUE,initialIntercept=FALSE)

# When there is missing data, interpolation also uses the time differences. 
missingData <- interventionalData[-4,]
fullData2 <- formatData(missingData)

interventionalDBN documentation built on May 2, 2019, 4:04 p.m.