spikePipe: ChIP-seq spike-in normalization wrapper function

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/spikePipe-methods.R

Description

This function performs all steps of spike-in normalization: Dataset creation, RPM scaling, input DNA subtraction, RPM scaling reversal, exogenous DNA scaling (spike) and binding values extraction.

Usage

1
2
3
4
5
6
    spikePipe(infoFile, bamPath, bigWigPath, anno, genome_version, 
              paired = FALSE, binsize = 50, profile_length_before = 2000, 
              profile_length_after= 2000, mean_or_median = "mean", 
              interpolation_number = 100, interpolation_average = 10000,
              ignore_strand = FALSE, verbose = FALSE, boost = FALSE, 
              outputFolder = NULL)

Arguments

infoFile

csv or tab separated txt file containing information about files (see details)

bamPath

Path to the folder containing bam files

bigWigPath

Path to the folder containing bigwig files

anno

File in GFF format containing annotations used to plot information

genome_version

The UCSC code of reference genome, e.g. 'hg19' for Homo sapiens (see details)

paired

Indicate if sequences are single- or paired-ended. Default is FALSE

binsize

Binning size used to create bigwig files. Default is 50.

profile_length_before

Length in bp of the interval upstream annotation (see details). Default is 2000.

profile_length_after

Length in bp of the interval downstream annotation (see details). Default is 2000.

mean_or_median

For average profiles, should the 'mean' or 'median' values be used. Default is 'mean'.

interpolation_number

Number of interpolated points to create matrices (see details). Default is 100.

interpolation_average

Number of interpolated points of profiles and heatmaps (see details). Default is 10000.

ignore_strand

If TRUE the directionality is ignored, that is all features strands, regardless of annotation in GFF file, are treated as undetermined ("*"). default is FALSE.

verbose

If TRUE, output processing messages. Default is FALSE.

boost

If TRUE, the object created enables to perform the analysis in boost mode (see details). Default is FALSE

outputFolder

Define the folder where scaled bigwig are output. Default is NULL (see details).

Details

'infoFile' should be a csv or a tab separated txt file. The column names should be: expName, endogenousBam, exogenousBam, inputBam, bigWigEndogenous and bigWigInput. These columns indicate the experiment names; the bam file names of data aligned to the reference genome; the bam file names of data aligned to the exogenous genome; the input DNA bam file names corresponding to each experiment; the bigwig file names of data aligned to the reference genome and the bigwig file names of input DNA experiments.

If 'infoFile' contains only one input file (specified for each experiment), a ChIPSeqSpikeDataset (or ChIPSeqSpikeDatasetBoost) object is created. if 'infoFile' contains different input DNA files, an object of type 'list' is created (ChIPSeqSpikeDatasetList or ChIPSeqSpikeDatasetListBoost). Each element of the list will contain all experiments corresponding to a given input DNA one.

This function calls different processing steps that overall perform ChIP-seq spike-in normalization. The steps and functions are called in the following order: Dataset creation (see ?spikeDataset), RPM scaling (see ?scaling), input DNA subtraction (see ?inputSubtraction), RPM scaling reversal (see ?scaling), exogenous DNA scaling (see ?scaling) and binding values extraction (see ?extractBinding).

For details on installing reference genomes, see details of the function '?getPlotSetArray' of the 'seqplots' package.

For more details on parameters profile_length_before, profile_length_after, mean_or_median, interpolation_number, interpolation_average and ignore_strand, see ?extractBinding.

If boost = TRUE, either a ?ChIPSeqSpikeDatasetBoost or ?ChIPSeqSpikeDatasetListBoost object is created. The boost mode enables to store the binding values in the form of a GRanges object and avoid reading/ writing files at each processing step. Even if faster, this mode however consumes much more memory and should be used with caution.

If outputFolder is not NULL, the original bigwig files should be copied to this folder before performing the analysis. This parameter was created to test the package with the provided files in extdata/.

On Windows operating system, this function is not available due to the Bioconductor package rtracklayer >= 1.37.6 which does not support bigWig files. This function will return null.

Value

Returns a spike-in normalized object with extracted binding values that can be used to perform graphical representations (see ?plotProfile, ?plotTransform, ?plotHeatmaps, ?boxplotSpike and ?plotCor).

According to the files provided in 'infoFile', different objects are returned:

Author(s)

Nicolas Descostes

See Also

ChIPSeqSpikeDataset ChIPSeqSpikeDatasetList ChIPSeqSpikeDatasetBoost ChIPSeqSpikeDatasetListBoost spikeDataset scaling inputSubtraction extractBinding plotProfile plotTransform plotHeatmaps boxplotSpike plotCor getPlotSetArray

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
info_file_csv <- system.file("extdata/info.csv", package="ChIPSeqSpike")
bam_path <- system.file("extdata/bam_files", package="ChIPSeqSpike")
bigwig_path <- system.file("extdata/bigwig_files", package="ChIPSeqSpike")
gff_vec <- system.file("extdata/test_coord.gff", package="ChIPSeqSpike")
genome_name <- "hg19"
output_folder <- "test_chipseqspike"
bigwig_files <- system.file("extdata/bigwig_files", 
                            c("H3K79me2_0-filtered.bw",
                              "H3K79me2_100-filtered.bw",
                              "H3K79me2_50-filtered.bw",
                              "input_0-filtered.bw",
                              "input_100-filtered.bw",
                              "input_50-filtered.bw"), package="ChIPSeqSpike")

if(.Platform$OS.type != 'windows') {

    ## Copying example files
    dir.create("./test_chipseqspike")
    result <- file.copy(bigwig_files, "test_chipseqspike")

    csds <- spikePipe(info_file_csv, bam_path, bigwig_path, gff_vec, 
                      genome_name, verbose = TRUE, 
                      outputFolder = output_folder)

    csds2 <- spikePipe(info_file_csv, bam_path, bigwig_path, gff_vec, 
                       genome_name, boost = TRUE, verbose = TRUE, 
                       outputFolder = output_folder)

    unlink("test_chipseqspike/", recursive = TRUE)
    is(csds)
    is(csds2)
}

descostesn/ChIPSeqSpike documentation built on Aug. 6, 2019, 3:51 p.m.