loadMAdata: Load and preprocess microarray data
In varemo/piano: Platform for integrative analysis of omics data

loadMAdata

R Documentation

Load and preprocess microarray data

Description

Loads, preprocesses and annotates microarray data to be further used by downstream functions in the piano package.

Usage

loadMAdata(
  datadir = getwd(),
  setup = "setup.txt",
  dataNorm,
  platform = "NULL",
  annotation,
  normalization = "plier",
  filter = TRUE,
  verbose = TRUE,
  ...
)

Arguments

`datadir`	character string giving the directory in which to look for the data. Defaults to `getwd()`.
`setup`	character string giving the name of the file containing the experimental setup, or an object of class `data.frame` or similar containing the experimental setup. Defaults to `"setup.txt"`, see details below for more information.
`dataNorm`	character string giving the name of the normalized data, or an object of class `data.frame` or similar containing the normalized data. Only to be used if the user wishes to start with normalized data rather then CEL files.
`platform`	character string giving the name of the platform, can be either `"yeast2"` or `NULL`. See details below for more information.
`annotation`	character string giving the name of the annotation file, or an object of class `data.frame` or similar containing the annotation information. The annotation should consist of the columns Gene name, Chromosome and Chromosome location. Not required if `platform="yeast2"`.
`normalization`	character string giving the normalization method, can be either `"plier"`, `"rma"` or `"mas5"`. Defaults to `"plier"`.
`filter`	should the data be filtered? If `TRUE` then probes not present in the annotation will be discarded. Defaults to `TRUE`.
`verbose`	verbose? Defaults to `TRUE`.
`...`	additional arguments to be passed to `ReadAffy`.

Details

This function requires at least two inputs: (1) data, either CEL files in the directory specified by datadir or normalized data specified by dataNorm, and (2) experimental setup specified by setup.

The setup shold be either a tab delimited text file with column headers or a data.frame. The first column should contain the names of the CEL files or the column names used for the normalized data, please be sure to use names valid as column names, e.g. avoid names starting with numbers. Additional columns should assign attributes in some category to each array. (For an example run the example below and look at the object myArrayData$setup.)

The piano package is customized for yeast 2.0 arrays and annotation will work automatically, if the cdfName of the arrays equals Yeast_2. If using normalized yeast 2.0 data as input, the user needs to set the argument platform="yeast2" to tell the function to use yeast annotation. If other platforms than yeast 2.0 is used, set platform=NULL (default) and supply appropriate annotation by the argument annotation. Note that the cdfName will override platform, so it can still be set to NULL for yeast 2.0 CEL files. Note also that annotation overrides platform, so if the user wants to use an alternative annotation for yeast, this can be done simply by specifying this in annotation.

The annotation should have the column headers Gene name, Chromosome and Chromosome location. The Gene name is used in the heatmap in diffExp and the Chromosome and Chromosome location is used by the polarPlot. The rownames (or first column if using a text file) should contain the probe IDs. If using a text file the first column should have the header probeID or similar. The filtering step discards all probes not listed in the annotation.

Normalization is performed on all CEL file data using one of the Affymetrix methods: PLIER ("plier") as implemented by justPlier, RMA (Robust Multi-Array Average) ("rma") expression measure as implemented by rma or MAS 5.0 expression measure "mas5" as implemented by mas5.

It is possible to pass additional arguments to ReadAffy, e.g. cdfname as this might be required for some types of CEL files.

Value

An ArrayData object (which is essentially a list) with the following elements:

`dataRaw`	raw data as an AffyBatch object
`dataNorm`	`data.frame` containing normalized expression values
`setup`	`data.frame` containing experimental setup
`annotation`	`data.frame` containing annotation

Depending on input arguments the ArrayData object may not include dataRaw and/or annotation.

Author(s)

Leif Varemo piano.rpkg@gmail.com and Intawat Nookaew piano.rpkg@gmail.com

References

Gautier, L., Cope, L., Bolstad, B. M., and Irizarry, R. A. affy - analysis of Affymetrix GeneChip data at the probe level. Bioinformatics. 20, 3, 307-315 (2004).

Examples


  # Get path to example data and setup files:
  dataPath <- system.file("extdata", package="piano")

  # Load normalized data:
  myArrayData <- loadMAdata(datadir=dataPath, dataNorm="norm_data.txt.gz", platform="yeast2")

  # Print to look at details:
  myArrayData

varemo/piano documentation built on Sept. 19, 2022, 12:01 p.m.