readBeadSummaryData: Read BeadStudio gene expression output
In beadarray: Quality assessment and low-level analysis for Illumina BeadArray data

Description Usage Arguments Details Value Author(s) Examples

Function to read the output of Illumina's BeadStudio software into beadarray

readBeadSummaryData(dataFile, qcFile=NULL, sampleSheet=NULL,
                    sep="\t", skip=8, ProbeID="ProbeID",
                    columns = list(exprs = "AVG_Signal", se.exprs="BEAD_STDERR",
                        nObservations = "Avg_NBEADS", Detection="Detection Pval"),
                    qc.sep="\t", qc.skip=8, controlID="ProbeID", 
                    qc.columns = list(exprs="AVG_Signal", se.exprs="BEAD_STDERR", 
			nObservations="Avg_NBEADS", Detection="Detection Pval"), 
		    illuminaAnnotation=NULL, dec=".", quote="", annoCols = c("TargetID", "PROBE_ID","SYMBOL"))

`dataFile`	character string specifying the name of the file containing the BeadStudio output for each probe on each array in an experiment (required). Ideally this should be the 'SampleProbeProfile' from BeadStudio.
`qcFile`	character string giving the name of the file containing the control probe intensities (optional). This file should be either the 'ControlProbeProfile' or 'ControlGeneProfile' from BeadStudio.
`sampleSheet`	character string used to specify the file containing sample infomation (optional)
`sep`	field separator character for the `dataFile` (`"\t"` for tab delimited or `","` for comma separated)
`skip`	number of header lines to skip at the top of `dataFile`. Default value is 8.
`ProbeID`	character string of the column in `dataFile` that contains identifiers that can be used to uniquely identify each probe
`columns`	list defining the column headings in `dataFile` which correspond to the matrices stored in the `assayData` slot of the final `ExpressionSetIllumina` object
`qc.sep`	field separator character for `qcFile`
`qc.skip`	number of header lines to skip at the top of `qcFile`
`controlID`	character string specifying the column in `qcFile` that contains the identifiers that uniquely identify each control probe
`qc.columns`	list defining the column headings in `qcFile` which correspond to the matrices stored in the `QCInfo` slot of the final `ExpressionSetIllumina` object
`illuminaAnnotation`	character string specifying the name of the annotation package (only available for certain expression arrays at present)
`dec`	the character used in the `dataFile` and `qcFile` for decimal points
`quote`	the set of quoting characters (disabled by default)
`annoCols`	additional columns containing annotation to be read from the file

This function can be used to read gene expression data exported from versions 1,2 and 3 of the Illumina BeadStudio application. The format of the BeadStudio output will depend on the version number. For example, the file may be comma or tab separated of have header information at the top of the file. The parameters sep and skip can be used to adapt the function as required (i.e. skip=7 is appropriate for data from earlier version of BeadStudio, and skip=0 is required if header information hasn't been exported.

The format of the BeadStudio file is assumed to have one row for each probe sequence in the experiment and a set number of columns for each array. The columns which are exported for each array are chosen by the user when running BeadStudio. At a minimum, columns for average intensity standard error, the number of beads and detection scores should be exported, along with a column which contains a unique identifier for each bead type (usually named "ProbeID").

It is assumed that the average bead intensities for each array appear in columns with headings of the form 'AVG\_Signal-ARRAY1', 'AVG\_Signal-ARRAY2',...,'AVG\_Signal-ARRAYN' for the N arrays found in the file. All other column headings are matched in the same way using the character strings specified in the columns argument.

NOTE: With version 2 of BeadStudio it is possible to export annotation and sequence information along with the intensities. We \_don't\_ recommend exporting this information, as special characters found in the annotation columns can cause problems when reading in the data. This annotation information can be retrieved later on from other Bioconductor packages.

The default object created by readBeadSummaryData is an ExpressionSetIllumina object.

If the control intensities have been exported from BeadStudio ('ControlProbeProfile') this may be read into beadarray as well. The qc.skip, qc.sep and qc.columns parameters can be used to adjust for the contents of the file. If the 'ControlGeneProfile' is exported, you will need to set controlID="TargetID".

Sample sheet information can also be used. This is a file format used by Illumina to specify which sample has been hybridised to each array in the experiment.

Note that if the probe identifiers are non-unique, the duplicated rows are removed. This may occur if the 'SampleGeneProfile' is exported from BeadStudio and/or ProbeID="TargetID" is specified (the "ProbeID" column has a unique identifier in the 'SampleProbeProfile', whereas the "TargetID" may not, as multiple beads can target the same transcript).

An ExpressionSetIllumina object.

Mark Dunning and Mike Smith

##Read the example data from
##http://www.switchtoi.com/datasets/asuragenmadqc/AsuragenMAQC_BeadStudioOutput.zip
##To follow this example, download the zip file 


## Not run: 
dataFile = "AsuragenMAQC-probe-raw.txt"

qcFile = "AsuragenMAQC-controls.txt"

BSData = readBeadSummaryData(dataFile=dataFile, qcFile=qcFile, controlID="ProbeID",skip=0,qc.skip=0, qc.columns=list(exprs = "AVG_Signal"))


## End(Not run)