read.madata: Read Microarray data

Description Usage Arguments Value Preparing data file Preparing design file Preparing covariate file Author(s) Examples

View source: R/read.madata.R

Description

This is the function to read Microarray experiment data from a TAB delimited text file or matrix object.

Usage

1
2
3
read.madata(datafile=datafile, designfile=designfile, covM = covM,
arrayType=c("oneColor", "twoColor"),header=TRUE, spotflag=FALSE, n.rep=1, avgreps=0,
  log.trans=FALSE, metarow, metacol, row, col, probeid, intensity, matchDataToDesign=FALSE, ...)

Arguments

datafile

Matrix R object or data file name with path name as a string.

designfile

Matrix or data.frame R object or design file name with path as a string.

covM

Gene specific covariate matrix. Specify this only if you have gene specific covariate matrix.

arrayType

Specify if it is one or two color array. Default is one color.

header

A logical value indicating when input files (data file, design file or covariate matrix) are TAB delimited file, whether they have column header.

spotflag

A flag to indicate whether the input file contains the flag for bad spot or not.

n.rep

An integer to represent the number of replicates.

avgreps

An integer to indicate whether to average or collapse the replicates or not. 0 means no average; 1 means to take the mean of the replicates; 2 means to take the median of the replicates.

log.trans

A logical value to indicate whether to take log2 transformation on the raw data or not. It is FALSE by default.If this is TRUE, TransformMethod field will be set to "log2".

metarow

For 2-dye array. The column number for meta row. Default values are 1s.

metacol

For 2-day array. The column number for meta column. Default values are 1s.

row

For 2-day array. The column number for row. Default value is NA.

col

For 2-day array. The column number for column. Default value is NA.

probeid

The column number storing probe (clone) id. When datafile is matrix R object, it assumes rowname of the data is probe id. If data does not have row name, then 1,2,... is used as a probe id. For TAB delimited file, if probeid is not provided, it assumes that the first column stores the probe id. If you do not have probe id then set probeid = 0.

intensity

The start column number of intensity. For the matrix R object, it assumes intensity starts from the first column and for TAB delimited file, it assumes intensity stars from the second column, as a default.

matchDataToDesign

Defaults to false. If set to TRUE then the datafile column headers (or colnames(datafile) in the case of a matrix) will be matched up to the design file's Array column. This allows you to ignore the input order of array data as long as the datafile's header values can be matched exactly to the designfile's Array values

...

Other gene information in the data file.

Value

An object of class madata, which is a list of following components:

n.gene

Total number of genes in the experiment.

n.rep

Number of replicates in the experiment, if .

n.spot

Number of spots for each gene.

data

data field. It is either the log2 transformed data (if log.trans=TRUE), or just the original data (if log.trans=FALSE).

n.array

Number of arrays in the experiment.

n.dye

Number of dyes.

flag

A matrix for spot flag. Each element corresponding to one spot. 0 means normal spot, all other values mean bad spot.

metarow

Meta row for each spot.

metacol

Meta column for each spot.

row

Row for each spot.

col

Column for each spot.

ArrayName

A list of strings to represent the names of intensity data.

design

An object to represent the experimental design.

Others

Other experiment information listed in the data file and specified by user.

Preparing data file

Before using the package, user need to prepare the input data file.

1) The data file can be a matrix type R object, such as the output of exprs() from array or beadarray package. It is assumed that the intensity is started from the first column and row name is probe ID. Otherwise, column number containing probe ID and intensity should be specified.

2) The data file can be a TAB delimited text file. In this file, each row corresponds to a gene. In the columns, you can put some gene specific information, e.g., the Probe ID, Gene Bank ID, etc. and the grid location of the spot. But most importantly you need to put the intensity data after that. Most of the Microarray gridding software generate one file for each slide. At this point, you need to manually combine them into the data file. You need to decide which data you want to use in analysis, e.g., mean versus median, background subtracted or not, etc. For N-dye array, your intensity data should have N columns for each array. These N columns need to be adjacent to each other. You can put the spot flag as a column after intensity data for each array. (Note that if you have flag, you will have N+1 columns data for each array.) If you have replicates, replicated measurements of the same probe (clone) on the same array should appear in adjacent rows.

For example, for a 2-dye cDNA array, you have four slides scanned by Gene Pix and you get four files. First you open your favorite Spread Sheet editor, e.g., MS Excel. Copy your probe ID and Cluster ID to the first 2 columns. Then open one of the files generated by Gene Pix, copy the grid location into next 4 columns (you only need to do this once because they are all the same for four slides). Then for all four files, copy the two columns of foreground median value (if you want to use it) and one column of flag to the file in the order of Cy5, Cy3, flag. Then select the whole file and row sort it according to probe ID. Save the file as tab delimited text file and you are done.

The data file must be "full", that is, all rows have to have the same number of fields. When you have missing data in your datafile, you need to check the data or use fill.missing to fill in missing variable.

Sometimes leading and trailing TAB in the text file will bring problems, depends on the operating system. So user need to be careful about that.

Preparing design file

Design file can be data.frame or matrix R object or TAB delimited text file. Number of rows of this file equals number of arrays times N (the number of dyes) (plus one for column header, if design file is a TAB delimited file and header = T). The row of design file *MUST* be organized by the order of datafile unless the matchDataToDesign parameter is set to TRUE. For example, if the datafile stores the intensity from array1, array11, array2,..., then the row of designfile must follow this order. Number of columns of this file depends on the experimental design. For example, you can have "Strain", "Diet", "Sex", etc. in your design file. You *MUST* have a column named "Array" in the design file. For two-color array, in addition to the "Array" column, you must have "Sample" and "Dye" columns (case sensitive) in the design file. "Sample" should be integers representing biological individuals. Reference samples should have Sample number to be zero(0). Reference sample will always be treated as fixed factor in mixed model and it will not be involved in any test.

You must NOT have "Spot", "Label" and "covM" columns. They are reserved for spotting, labeling and covariance effects.

Note that you DO NOT have to use all factors in design file. You can put all factors in design file but turn them on/off in formula in fitmaanova.

Preparing covariate file

If you have array specific covariate, it should be included in the design matrix. If you have gene specific covariate, you need to prepare matrix type R object or TAB delimited text file, "covM". The size of "covM" equals to the size of intensity data (and TAB delimited text file must have column header if header = T, but NO row name). Specify covM only if you have gene specific covariate variable. Covariate variable must be a numeric value and need to be specified in the fitmaanova.

Author(s)

Hao Wu

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# note that .CEL files are not distributed with the package, thus following
# code does not work. This shows how to read data from affy (or beadarray)
# package, when TAB delimited design file is ready.

## Not run: 
library(affy)
beforeRma <- ReadAffy()
rmaData <- rma(beforeRma)
datafile <- exprs(rmaData)
abf1 <- read.madata(datafile=datafile,designfile="design.txt")

# make and read designfile (data.frame type R object) from R
design.table <- data.frame(Array=row.names(pData(beforeRma)));
Strain <- rep(c('Aj', 'B6', 'B6xAJ'), each=6)
Sample <- rep(c(1:9), each=2)
designfile <- cbind(design.table, Strain, Sample)
abf1 <- read.madata(datafile, designfile=designfile)

# read in a TAB delimited file with spot flag - for two color array
# HAVE TO SPECIFY that the data is from two color array
 kidney.raw <- read.madata("kidney.txt", designfile="kidneydesign.txt", 
	metarow=1, metacol=2, col=3, row=4, probeid=6,
	intensity=7, arrayType='twoColor',log.trans=T, spotflag=T)

## End(Not run)

maanova documentation built on Nov. 8, 2020, 8:21 p.m.