This function can decrypt Illumina gene expression iDAT
files (aka version 1 iDAT files). It will create
Sample Probe Profile.txt and
Profile.txt files, similar to Illumina GenomeStudio
version 1.8.0 . We have made every effort to reproduce
the GenomeStudio output, down to the Detection P-Value
calculation, the order of the rows, the background
correction procedure (which is only applied to
gene-probes), the output file headers sample and ProbeID
naming, however we cannot guarantee that the output will
be identical to that produced by GenomeStudio.
1 2 3 4 5 6
preprocess.illumina.idat(files = NULL, path = NULL, zipfile = NULL, manifestfile = NULL, clmfile = NULL, probeID = c("ProbeID", "ArrayAddressID", "Sequence", "NuID"), collapseMode = c("none", "max", "median", "mean"), backgroundCorrect = FALSE, outdir = NULL, prefix = NULL, verbose = FALSE, memory = "-Xmx1024m")
A character vector of at least one file name
The path to the directories where the files are. This defaults to the current working directory.
the path to a zip file containing idat
files. This over-rides the
The full path to the Array manifest
file in TXT format. See
This controls which value to identify each probe by. Allowable values are “ArrayAddressID”, “ProbeID”, “Sequence”, “NuID”.
Collapse probes to genes using the specified mode. Valid values are “none” (the default), “max”, “median” and “mean” (the GenomeStudio default).
logical, if TRUE then peform background correction on only the gene-level probes; if FALSE, do no correction.
The full path to the output directory. If
An character used as a file name prefix for the files that will be created. “NONE” (the default) means no prefix will be used.
The maximum Java memory heapsize. eg: "-Xmx1024m", or "-Xmx4g", which reserves 1GB or 4G of RAM, respectively. default="-Xmx1024m"
If collapseMode=“NONE”, then invisbly return a character containing the file paths of the “Sample Probe Profile.txt” and “Control Probe Profile.txt” files. Otherwise, invisbly return a character containing the file paths of the “Sample Gene Profile.txt” and “Control Gene Profile.txt” files.
iDAT files can be specified in 3 main ways, either as a
character vector of file paths, via the
parameter; (2) a character vector of filenames, and the
path to the directy containing the files via
path arguments; (3) via a single zip file
containing iDAT files, via the
The zip file may contain subfolders, though each array
should have a unique name, regardless of where it sits in
the zipfile structure.
Each array version and revision has a specific manifest
file; these are required for decoding idat files, and can
be queried, or downloaded via
respectively. Failing that, you can download them
manually directly from Illumina [2,3]. It is important to
use the TXT version, not the BGX version. You can use a
newer Release of the manifest file as long as you
get the right array type and Version. For example,
HumanHT-12_V3 arrays can use the
HumanHT-12_V3_0_R3_11283641_A.txt manifest files.
There is some confusion about the naming of Illumina
Probes. From the array manifest files, you can uniquely
identify each probe by either the Illumina Probe ID (eg
ILMN_1802252), the Array Address ID (eg 2640048), the
Probe_Sequence, or the NuID . Illumina's GenomeStudio
uses the ArrayAddressID in the ProbeID column; I think
the Probe_ID or NuID are better alternatives. If you
NuID, note that unlike
addNuID2lumi, which uses pre-built
mapping tables, we calculate NuID's directly from the
probe sequences within the manifest file.
Some genes are represented by multiple probes. You can either obtain values at the probe-level, or the gene-level using the ‘collapseMode’ parameter:
“none”The default: do no collapsing. Produces Sample Probe Profile.txt and Control Probe Profile.txt files
“max”Choose the probe with the highest AVG_Intensity value
“median”Choose the probe with the median AVG_Intensity value
“mean”Determine the mean fluorescence across all probes
Choosing either “max”, “median” or “mean” creates “Sample Gene Profile.txt” and “Control Gene Profile.txt” files. “mean” is the method that GenomeStudio uses, however we were unable to reproduce the mean values for “numBeads”, “Detection PValue” and “Probe_STDERR” - we opted for taking the mean of these values. When using “median” and there are an even number of probes for a gene, there is no single median probe, so from the 2 median probes, we take the conservative option of choosing the average AVG_Intensity, the smallest NumBeads, and the largest Probe_STDERR and Detection PVal, for those 2 median probes
Illumina GenomeStudio offers a background correction option. This estimates the background, and subtracts it from only the gene-level probes; ie control probes are never background corrected. The value is the mean AVG_Signal level of all the negative control probes on each array.
At the heart of this function is a Java program, which
requires that you specify an appropriate maximum amount
of memory. The default is -Xmx1024m, which reserves 1 GB
RAM. We have analysed 85 Human HT12 arrays using
-Xmx2048m. If you get the following error:
‘Exception in thread "main"
java.lang.OutOfMemoryError: Java heap space’ Then you
need to increase the amount of RAM, upto the maximum
available in your system. If you still get the error,
then you need a 64-bit system with lots of RAM.
As a rough guide, on my machine, for Human HT12 arrays, the amount of RAM required is approximately 333MB RAM + 19MB per array.
manifestrepresent manifest files the way that Affymetrix CDF's are (ie in data packages).
annotationImport probe-level annotation from the manifest file into the resulting LumiBatch object.
geneAdd options for gene-level summarisation.
Mark Cowley [email protected], with contributions from Mark Pinese, David Eby.
 Genome Studio Gene Expression Module User Guide,
version 1.0, Illumina
 Du, P., Kibbe, W. A., & Lin, S. M. (2007). nuID: a universal naming scheme of oligonucleotides for illumina, affymetrix, and other microarrays Biol Direct, 2, 16. (doi:10.1186/1745-6150-2-16)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
# iDAT files+path as input path <- system.file("extdata", package="lumidat") outdir <- tempdir() files <- c("5356583020_A_Grn.idat", "5356583020_B_Grn.idat") manifestfile <- system.file("extdata", "HumanHT-12_V3_0_R1_99999999.txt", package="lumidat") files <- preprocess.illumina.idat(files, path, manifestfile=manifestfile, probeID="NuID", outdir=outdir) files # zipfile as input path <- system.file("extdata", package="lumidat") files <- c("5356583020_A_Grn.idat", "5356583020_B_Grn.idat") zipfile <- tempfile(fileext=".zip") zip(zipfile, file.path(path, files), flags="-r9Xq") files <- preprocess.illumina.idat(zipfile=zipfile, manifestfile=manifestfile, probeID="NuID", outdir=outdir) files # CLM file path <- system.file("extdata", package="lumidat") outdir <- tempdir() files <- c("5356583020_A_Grn.idat", "5356583020_B_Grn.idat") manifestfile <- system.file("extdata", "HumanHT-12_V3_0_R1_99999999.txt", package="lumidat") clmfile <- system.file("extdata", "5356583020.clm", package="lumidat") files <- preprocess.illumina.idat(files, path, manifestfile=manifestfile, clmfile=clmfile, probeID="NuID", outdir=outdir) readLines(files, 2) #  "TargetID\tProbeID\tA.AVG_Signal\tA.BEAD_STDERR\tA.Avg_NBEADS\tA.Detection Pval\tB.AVG_Signal\tB.BEAD_STDERR\tB.Avg_NBEADS\tB.Detection Pval" #  "biotin\t0gwIEN5441dbo3j27A\t16902.39\t5062.077\t85\t1.0000000\t16668.27\t6532.465\t60\t1.0000000" ## Not run: # Get the Human HT12 V4 manifest file: manifestfile <- download_illumina_manifest_file("HumanHT-12_V4_0_R2_15002873_B", "txt") ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.