readCel: Reads an Affymetrix CEL file

Description Usage Arguments Value Outliers and masked cells Memory usage Troubleshooting Author(s) See Also Examples

View source: R/readCel.R

Description

This function reads all or a subset of the data in an Affymetrix CEL file.

Usage

1
2
3
4
5
6
7
8
9
readCel(filename, 
        indices = NULL, 
        readHeader = TRUE, 
        readXY = FALSE, readIntensities = TRUE,
        readStdvs = FALSE, readPixels = FALSE,
        readOutliers = TRUE, readMasked = TRUE, 
        readMap = NULL,
        verbose = 0,
        .checkArgs = TRUE)

Arguments

filename

the name of the CEL file.

indices

a vector of indices indicating which features to read. If the argument is NULL all features will be returned.

readXY

a logical: will the (x,y) coordinates be returned.

readIntensities

a logical: will the intensities be returned.

readStdvs

a logical: will the standard deviations be returned.

readPixels

a logical: will the number of pixels be returned.

readOutliers

a logical: will the outliers be return.

readMasked

a logical: will the masked features be returned.

readHeader

a logical: will the header of the file be returned.

readMap

A vector remapping cell indices to file indices. If NULL, no mapping is used.

verbose

how verbose do we want to be. 0 is no verbosity, higher numbers mean more verbose output. At the moment the values 0, 1 and 2 are supported.

.checkArgs

If TRUE, the arguments will be validated, otherwise not. Warning: This should only be used if the arguments have been validated elsewhere!

Value

A CEL files consists of a header, a set of cell values, and information about outliers and masked cells.

The cell values, which are values extract for each cell (aka feature or probe), are the (x,y) coordinate, intensity and standard deviation estimates, and the number of pixels in the cell. If readIndices=NULL, cell values for all cells are returned, Only cell values specified by argument readIndices are returned.

This value returns a named list with components described below:

header

The header of the CEL file. Equivalent to the output from readCelHeader, see the documentation for that function.

x,y

(cell values) Two integer vectors containing the x and y coordinates associated with each feature.

intensities

(cell value) A numeric vector containing the intensity associated with each feature.

stdvs

(cell value) A numeric vector containing the standard deviation associated with each feature.

pixels

(cell value) An integer vector containing the number of pixels associated with each feature.

outliers

An integer vector of indices specifying which of the queried cells that are flagged as outliers. Note that there is a difference between outliers=NULL and outliers=integer(0); the last case happens when readOutliers=TRUE but there are no outliers.

masked

An integer vector of indices specifying which of the queried cells that are flagged as masked. Note that there is a difference between masked=NULL and masked=integer(0); the last case happens when readMasked=TRUE but there are no masked features.

The elements of the cell values are ordered according to argument indices. The lengths of the cell-value elements equals the number of cells read.

Which of the above elements that are returned are controlled by the readNnn arguments. If FALSE, the corresponding element above is NULL, e.g. if readStdvs=FALSE then stdvs is NULL.

Outliers and masked cells

The Affymetrix image analysis software flags cells as outliers and masked. This method does not return these flags, but instead vectors of cell indices listing which cells of the queried cells are outliers and masked, respectively. The current community view seems to be that this should be done based on statistical modeling of the actual probe intensities and should be based on the choice of preprocessing algorithm. Most algorithms are only using the intensities from the CEL file.

Memory usage

The Fusion SDK allocates memory for the entire CEL file, when the file is accessed (but does not actually read the file into memory). Using the indices argument will therefore only affect the memory use of the final object (as well as speed), not the memory allocated in the C function used to parse the file. This should be a minor problem however.

Troubleshooting

It is considered a bug if the file contains information not accessible by this function, please report it.

Author(s)

James Bullard and Kasper Daniel Hansen

See Also

readCelHeader() for a description of the header output. Often a user only wants to read the intensities, look at readCelIntensities() for a function specialized for that use.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
  for (zzz in 0) {  # Only so that 'break' can be used

  # Scan current directory for CEL files
  celFiles <- list.files(pattern="[.](c|C)(e|E)(l|L)$")
  if (length(celFiles) == 0)
    break;

  celFile <- celFiles[1]

  # Read a subset of cells
  idxs <- c(1:5, 1250:1500, 450:440)
  cel <- readCel(celFile, indices=idxs, readOutliers=TRUE)
  str(cel)

  # Clean up
  rm(celFiles, celFile, cel)

  } # for (zzz in 0)

Example output



affxparser documentation built on Nov. 1, 2018, 2:25 a.m.