beads: BEADS normalization algorithm for ChIP-seq experiments
In Przemol/rbeads: R implementation of Bias Elimination Algorithm for Deep Sequencing

Description Usage Arguments Details Value Methods (by class) Author(s) References Examples

The function runs BEAD algorithm on aligned files. It requires short read alignment in BAM format, input track (either single experiment BAM or summed input created using sumBAMinputs function), mappability track in BigWig format (see details) and reference genome in FASTA fomat or as BSgenome genomic package.

beads(experiment, control, mappability, genome, uniq = TRUE, insert = 200L,
  mapq_cutoff = 10L, export = "BEADS", rdata = FALSE, export_er = TRUE,
  quickMap = TRUE, ...)

## S4 method for signature 'BamFile,BamFile,BigWigFile,ANY'
beads(experiment, control,
  mappability, genome, uniq = TRUE, insert = 200L, mapq_cutoff = 10L,
  export = "BEADS", rdata = FALSE, export_er = TRUE, quickMap = TRUE,
  ...)

## S4 method for signature 'BamFile,BigWigFile,BigWigFile,ANY'
beads(experiment, control,
  mappability, genome, uniq = TRUE, insert = 200L, mapq_cutoff = 10L,
  export = "BEADS", rdata = FALSE, export_er = TRUE, quickMap = TRUE,
  ...)

## S4 method for signature 'character,character,character,character'
beads(experiment, control,
  mappability, genome, uniq = TRUE, insert = 200L, mapq_cutoff = 10L,
  export = "BEADS", rdata = FALSE, export_er = TRUE, quickMap = TRUE,
  ...)

`experiment`	The path to experiment (sample) alignment file in BAM format or `BamFile` class.
`control`	The path to control (input) alignment file in BAM format or summed input file in BigWiggle format (accepts `BamFile` and `BigWigFile` classes as well).
`mappability`	The path to mappability track in BigWiggle format (accepts `BigWigFile` class as well).
`genome`	The path reference genome FASTA or UCSC identifier for installed `BSgenome` packages e.g. "hg19" for human
`uniq`	If TRUE the alignment will be uniqued, i.e. only one of non-unique reads will be used.
`insert`	The expected insert size in base pairs.
`mapq_cutoff`	The cutoff parameter used to filter BAM alignments for low mapping quality reads.
`export`	The character vector of BW tracks to be exported.
`rdata`	If TRUE all data will be exported as R binaries in addition to BigWiggle tracks.
`export_er`	If TRUE the enriched regions will be exported to BED format.
`quickMap`	If TRUE the quick mappability processing be used, otherwise the mappability track will be processed by running mean smoothing.
`...`	parameters passed by reference to rbeads internal functions.

Mappability/alignability tracks gives numeric score for level of reference sequence uniqueness. Short reads cannot be confidently aligned to non-unique sequences, so BEADS masks the out. The pre-calculated tracks for many species can be found in genome databases, e.g. following link gives the collection of human tracks for reference genome GRCh37/hg19: http://genome.ucsc.edu/cgi-bin/hgTrackUi?hgsid=340327143&g=wgEncodeMapability.

For other species mappability track can be easily calculated from reference FASTA file using GEM-mappability software (http://www.ncbi.nlm.nih.gov/pubmed/22276185). The GEM library binaries are available on SourceForge: (http://sourceforge.net/projects/gemlibrary/files/gem-library/), while the reference genome FASTA file can be obtained from UCSC: (http://hgdownload.soe.ucsc.edu/downloads.html). The following example illustrates the procedure for C. elegans reference genome (36bp read length and 8 parallel threads set in options):

gem-indexer -i ce10.fa -o ce10 -T 8
gem-mappability -I ce10.gem -l 36 -o ce10 -T 8
gem-2-wig -I ce10.gem -i ce10.map -o ce10
wigToBigWig ce10.wig ce10.chrom.sizes ce10.bw

By default only BEADS normalized track is exported. The export files can be control by export parameter, which is the character vector containing following values:

'BEADS': fully normalized files, i.e. GC correction, mappability correction, division by input and scaling by median
'GCandMap': GC correction and mappability correction
'GCcorected': GC correction
'readsCoverage': raw reads coverage, after extending to insert length, filtering /codemapq_cutoff and uniquing if uniq
'control_GCandMap': GC normalized and mappability corrected input, only works if input is a BAM file
'control_GCcorected': GC normalized, only works if input is a BAM file
'control_readsCoverage': input raw reads coverage, after extending to insert length, filtering mapq_cutoff and uniquing if uniq, only works if input is a BAM file

For example: beads(sample_bam, input_bam, map, fa, export=c('control_readsCoverage', 'control_GCcorected', 'control_GCandMap', 'readsCoverage', 'GCcorected', 'GCandMap', 'BEADS'))

Named BigWigFileList containing exported BW file connections.

experiment = BamFile,control = BamFile,mappability = BigWigFile,genome = ANY: Method for signature experiment='BamFile', control='BamFile', mappability='BigWigFile', genome='ANY'
experiment = BamFile,control = BigWigFile,mappability = BigWigFile,genome = ANY: Method for signature experiment='BamFile', control='BigWigFile', mappability='BigWigFile', genome='ANY'
experiment = character,control = character,mappability = character,genome = character: Method for signature experiment='character', control='character', mappability='character', genome='character'

Przemyslaw Stempor

http://beads.sourceforge.net/
http://www.ncbi.nlm.nih.gov/pubmed/21646344

# Get the paths of example files
sample_bam <- system.file("extdata", "GSM1208360_chrI_100Kb_q5_sample.bam", package="rbeads")
input_bam <- system.file("extdata", "Input_fE3_AA169.bam", package="rbeads")
SummedInput_bw <- system.file("extdata", "Ce10_HiSeqFRMInput_UNIQ_bin25bp_chrI_100Kb_sample.bw", package="rbeads")
map_bw <- system.file("extdata", "ce10_mappability_chrI_100Kb_sample.bw", package="rbeads")
ref_fa <- system.file("extdata", "ce10_chrI_100Kb_sample.fa", package="rbeads")

# Set the directory where the output files will be crated
setwd(tempdir())

# Run BEADS for BAM input file
beads(sample_bam, input_bam, map_bw, ref_fa)

# Run BEADS for SummedInput (BigWig) input file
beads(sample_bam, SummedInput_bw, map_bw, ref_fa)

## Not run: 
## Run BEADS for BSgenome package, the reference genome package have to be installed prior to running this example
# source("http://bioconductor.org/biocLite.R")
# biocLite("BSgenome.Celegans.UCSC.ce10")
# library(BSgenome.Celegans.UCSC.ce10)
beads(sample_bam, SummedInput_bw, map_bw, genome='ce10')

## Run BEADS for all BAM files in the directory
#lapply(dir(pattern='bam$'), beads, control=input, mappability=map_bw, genome=ref_fa)

## End(Not run)

## Not run: 
## In parallel:
# library(parallel)
# mclapply(dir(pattern='.bam$'), beads, control=input, mappability=map_bw, genome='ce10', mc.cores=parallel::detectCores())

## End(Not run)