processTilingArray: Obtain and clean nucleosome positioning data from tiling...

Description Usage Arguments Details Value Warning Note Author(s) See Also Examples

Description

Process and transform the microarray data coming from tiling array nucleosome positioning experiments.

Usage

1
2
processTilingArray(data, exprName, chrPattern, inferLen = 50, mc.cores = 1,
  quiet = FALSE)

Arguments

data

ExpressionSet object wich contains the data of the tiling array.

exprName

Name of the sample in ExpressionSet which contains the ratio between nucleosomal and genomic dna (if using Starr, the description argument supplied to getRatio function). If this name is not provided, it is assumed data has only one column.

chrPattern

Only chromosomes that contain chrPattern string will be selected from ExpressionSet. Sometimes tilling arrays contain control quality information that is imported as a chromosome. This allows filtering it. If no value is supplied, all chromosomes will be used.

inferLen

Maximum length (in basepairs) for allowing data gaps inference. See details for further information.

mc.cores

Number of cores available to parallel data processing.

quiet

Avoid printing on going information (TRUE | FALSE)

Details

The processing of tiling arrays could be complicated as many types exists on the market. This function deals ok with Affymetrix Tiling Arrays in yeast, but hasn't been tested on other species or platforms.

The main aim is convert the output of preprocessing steps (supplied by third-parties packages) to a clean genome wide nucleosome occupancy profile.

Tiling arrays doesn't use to provide a one-basepair resolution data, so one gets one value per probe in the array, covering X basepairs and shifted (tiled) Y basepairs respect the surrounding ones. So, one gets a piece of information every Y basepairs.

This function tries to convert this noisy, low resolution data, to a one-basepair signal, which allows a fast recognition of nucleosomes without using large and artificious statistical machinery as Hidden Markov Models using posterionr noise cleaning process.

As example, imagine your array has probes of 20mers and a tiling between probes of 10bp. Starting at position 1 (covering coordinates from 1 to 20), the next probe will be in position 10 (covering the coordinates 10 to 29). This can be represented as two hybridization intensity values on coordinates 1 and 10. This function will try to infer (using a lineal distribution) the values from 2 to 9 using the existing values of probes in coordinate 1 and coordinate 10.

The tiling space between adjacent array probes could be not constant, or could be also there are regions not covered in the used microarray. With the function argument inferLen you can specify wich amout of space (in basepairs) you allow to infer the non-present values.

If at some point the range not covered (gap) between two adjacent probes of the array is greater than inferLen value, then the coordinates between these probes will be setted to NA.

Value

RleList with the observed/inferred values for each coordinate.

Warning

This function could not cover all kind of arrays in the market. This package assumes the data is processed and normalized prior this processing, using standard microarray packages existing for R, like Starr.

Note

This function should be suitable for all data objects of kind ExpressionSet coding the annotations "chr" for chromosome and "pos" for position (acccessible by pData(data@featureData)) and a expression value (accessible by exprs(data)).

Author(s)

Oscar Flores oflores@mmb.pcb.ub.es

See Also

Biobase::ExpressionSet, Starr::getRatio()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
## Not run: 
    # Dataset cannot be provided for size restrictions
    # This is the code used to get the hybridization ratio with Starr from
    # CEL files
    library("Starr")
    TA_parsed <- readCelFile(
        BPMap, CELfiles, CELnames, CELtype, featureData=TRUE, log.it=TRUE
    )
    TA_loess <- normalize.Probes(TA_parsed, method="loess")
    TA_ratio <- getRatio(
        TA_loess, TA_loess$type=="IP", TA_loess$type=="CONTROL", "myRatio"
    )

    # From here, we use nucleR:

    # Preprocess the array, using the calculated ratio feature we named
    # "myRatio".

    # This will also select only those chromosomes with the pattern
    # "Sc:Oct_2003;chr", removing control data present in that tiling
    # array.

    # Finally, we allow that loci not covered by a prove being inferred
    # from adjacent ones, as far as they are separated by 50bp or less
    arr <- processTilingArray(
        TA_ratio, "myRatio", chrPattern="Sc:Oct_2003;chr", inferLen=50
    )

    # From here we can proceed with the analysis:
    arr_fft <- filterFFT(arr)
    arr_pea <- peakDetection(arr_fft)
    plotPeaks(arr_pea, arr_fft)

## End(Not run)

gthar/nucleR documentation built on May 28, 2019, 4:37 p.m.