beads: BEADS normalization algorithm for ChIP-seq experiments

Description Usage Arguments Details Value Methods (by class) Author(s) References Examples

Description

The function runs BEAD algorithm on aligned files. It requires short read alignment in BAM format, input track (either single experiment BAM or summed input created using sumBAMinputs function), mappability track in BigWig format (see details) and reference genome in FASTA fomat or as BSgenome genomic package.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
beads(experiment, control, mappability, genome, uniq = TRUE, insert = 200L,
  mapq_cutoff = 10L, export = "BEADS", rdata = FALSE, export_er = TRUE,
  quickMap = TRUE, ...)

## S4 method for signature 'BamFile,BamFile,BigWigFile,ANY'
beads(experiment, control,
  mappability, genome, uniq = TRUE, insert = 200L, mapq_cutoff = 10L,
  export = "BEADS", rdata = FALSE, export_er = TRUE, quickMap = TRUE,
  ...)

## S4 method for signature 'BamFile,BigWigFile,BigWigFile,ANY'
beads(experiment, control,
  mappability, genome, uniq = TRUE, insert = 200L, mapq_cutoff = 10L,
  export = "BEADS", rdata = FALSE, export_er = TRUE, quickMap = TRUE,
  ...)

## S4 method for signature 'character,character,character,character'
beads(experiment, control,
  mappability, genome, uniq = TRUE, insert = 200L, mapq_cutoff = 10L,
  export = "BEADS", rdata = FALSE, export_er = TRUE, quickMap = TRUE,
  ...)

Arguments

experiment

The path to experiment (sample) alignment file in BAM format or BamFile class.

control

The path to control (input) alignment file in BAM format or summed input file in BigWiggle format (accepts BamFile and BigWigFile classes as well).

mappability

The path to mappability track in BigWiggle format (accepts BigWigFile class as well).

genome

The path reference genome FASTA or UCSC identifier for installed BSgenome packages e.g. "hg19" for human

uniq

If TRUE the alignment will be uniqued, i.e. only one of non-unique reads will be used.

insert

The expected insert size in base pairs.

mapq_cutoff

The cutoff parameter used to filter BAM alignments for low mapping quality reads.

export

The character vector of BW tracks to be exported.

rdata

If TRUE all data will be exported as R binaries in addition to BigWiggle tracks.

export_er

If TRUE the enriched regions will be exported to BED format.

quickMap

If TRUE the quick mappability processing be used, otherwise the mappability track will be processed by running mean smoothing.

...

parameters passed by reference to rbeads internal functions.

Details

Mappability/alignability tracks gives numeric score for level of reference sequence uniqueness. Short reads cannot be confidently aligned to non-unique sequences, so BEADS masks the out. The pre-calculated tracks for many species can be found in genome databases, e.g. following link gives the collection of human tracks for reference genome GRCh37/hg19: http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=340327143&g=wgEncodeMapability.

For other species mappability track can be easily calculated from reference FASTA file using GEM-mappability software (http://www.ncbi.nlm.nih.gov/pubmed/22276185). The GEM library binaries are available on SourceForge: (http://sourceforge.net/projects/gemlibrary/files/gem-library/), while the reference genome FASTA file can be obtained from UCSC: (http://hgdownload.soe.ucsc.edu/downloads.html). The following example illustrates the procedure for C. elegans reference genome (36bp read length and 8 parallel threads set in options):

gem-indexer -i ce10.fa -o ce10 -T 8
gem-mappability -I ce10.gem -l 36 -o ce10 -T 8
gem-2-wig -I ce10.gem -i ce10.map -o ce10
wigToBigWig ce10.wig ce10.chrom.sizes ce10.bw

By default only BEADS normalized track is exported. The export files can be control by export parameter, which is the character vector containing following values:

'BEADS'

fully normalized files, i.e. GC correction, mappability correction, division by input and scaling by median

'GCandMap'

GC correction and mappability correction

'GCcorected'

GC correction

'readsCoverage'

raw reads coverage, after extending to insert length, filtering /codemapq_cutoff and uniquing if uniq

'control_GCandMap'

GC normalized and mappability corrected input, only works if input is a BAM file

'control_GCcorected'

GC normalized, only works if input is a BAM file

'control_readsCoverage'

input raw reads coverage, after extending to insert length, filtering mapq_cutoff and uniquing if uniq, only works if input is a BAM file

For example: beads(sample_bam, input_bam, map, fa, export=c('control_readsCoverage', 'control_GCcorected', 'control_GCandMap', 'readsCoverage', 'GCcorected', 'GCandMap', 'BEADS'))

Value

Named BigWigFileList containing exported BW file connections.

Methods (by class)

Author(s)

Przemyslaw Stempor

References

http://beads.sourceforge.net/
http://www.ncbi.nlm.nih.gov/pubmed/21646344

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
# Get the paths of example files
sample_bam <- system.file("extdata", "GSM1208360_chrI_100Kb_q5_sample.bam", package="rbeads")
input_bam <- system.file("extdata", "Input_fE3_AA169.bam", package="rbeads")
SummedInput_bw <- system.file("extdata", "Ce10_HiSeqFRMInput_UNIQ_bin25bp_chrI_100Kb_sample.bw", package="rbeads")
map_bw <- system.file("extdata", "ce10_mappability_chrI_100Kb_sample.bw", package="rbeads")
ref_fa <- system.file("extdata", "ce10_chrI_100Kb_sample.fa", package="rbeads")

# Set the directory where the output files will be crated
setwd(tempdir())

# Run BEADS for BAM input file
beads(sample_bam, input_bam, map_bw, ref_fa)

# Run BEADS for SummedInput (BigWig) input file
beads(sample_bam, SummedInput_bw, map_bw, ref_fa)

## Not run: 
## Run BEADS for BSgenome package, the reference genome package have to be installed prior to running this example
# source("http://bioconductor.org/biocLite.R")
# biocLite("BSgenome.Celegans.UCSC.ce10")
# library(BSgenome.Celegans.UCSC.ce10)
beads(sample_bam, SummedInput_bw, map_bw, genome='ce10')

## Run BEADS for all BAM files in the directory
#lapply(dir(pattern='bam$'), beads, control=input, mappability=map_bw, genome=ref_fa)

## End(Not run)

## Not run: 
## In parallel:
# library(parallel)
# mclapply(dir(pattern='.bam$'), beads, control=input, mappability=map_bw, genome='ce10', mc.cores=parallel::detectCores())

## End(Not run)

Przemol/rbeads documentation built on May 8, 2019, 3:46 a.m.