Aneufinder: Wrapper function for the 'AneuFinder' package

Description Usage Arguments Value Author(s) Examples

View source: R/Aneufinder.R

Description

This function is an easy-to-use wrapper to bin the data, find copy-number-variations, locate breakpoints, plot genomewide heatmaps, distributions, profiles and karyograms.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
Aneufinder(inputfolder, outputfolder, configfile = NULL, numCPU = 1,
  reuse.existing.files = TRUE, binsizes = 1e+06, stepsizes = binsizes,
  variable.width.reference = NULL, reads.per.bin = NULL,
  pairedEndReads = FALSE, assembly = NULL, chromosomes = NULL,
  remove.duplicate.reads = TRUE, min.mapq = 10, blacklist = NULL,
  use.bamsignals = FALSE, reads.store = FALSE, correction.method = NULL,
  GC.BSgenome = NULL, method = c("edivisive"), strandseq = FALSE,
  R = 10, sig.lvl = 0.1, eps = 0.01, max.time = 60, max.iter = 5000,
  num.trials = 15, states = c("zero-inflation", paste0(0:10, "-somy")),
  confint = NULL, refine.breakpoints = FALSE, hotspot.bandwidth = NULL,
  hotspot.pval = 0.05, cluster.plots = TRUE)

Arguments

inputfolder

Folder with either BAM or BED files.

outputfolder

Folder to output the results. If it does not exist it will be created.

configfile

A file specifying the parameters of this function (without inputfolder, outputfolder and configfile). Having the parameters in a file can be handy if many samples with the same parameter settings are to be run. If a configfile is specified, it will take priority over the command line parameters.

numCPU

The numbers of CPUs that are used. Should not be more than available on your machine.

reuse.existing.files

A logical indicating whether or not existing files in outputfolder should be reused.

binsizes

An integer vector with bin sizes. If more than one value is given, output files will be produced for each bin size.

stepsizes

A vector of step sizes the same length as binsizes. Only used for method="HMM".

variable.width.reference

A BAM file that is used as reference to produce variable width bins. See variableWidthBins for details.

reads.per.bin

Approximate number of desired reads per bin. The bin size will be selected accordingly. Output files are produced for each value.

pairedEndReads

Set to TRUE if you have paired-end reads in your BAM files (not implemented for BED files).

assembly

Please see fetchExtendedChromInfoFromUCSC for available assemblies. Only necessary when importing BED files. BAM files are handled automatically. Alternatively a data.frame with columns 'chromosome' and 'length'.

chromosomes

If only a subset of the chromosomes should be imported, specify them here.

remove.duplicate.reads

A logical indicating whether or not duplicate reads should be removed.

min.mapq

Minimum mapping quality when importing from BAM files. Set min.mapq=NA to keep all reads.

blacklist

A GRanges-class or a bed(.gz) file with blacklisted regions. Reads falling into those regions will be discarded.

use.bamsignals

If TRUE the bamsignals package will be used for binning. This gives a tremendous performance increase for the binning step. reads.store and calc.complexity will be set to FALSE in this case.

reads.store

Set reads.store=TRUE to store read fragments as RData in folder 'data' and as BED files in 'BROWSERFILES/data'. This option will force use.bamsignals=FALSE.

correction.method

Correction methods to be used for the binned read counts. Currently only 'GC'.

GC.BSgenome

A BSgenome object which contains the DNA sequence that is used for the GC correction.

method

Any combination of c('HMM','dnacopy','edivisive'). Option method='HMM' uses a Hidden Markov Model as described in doi:10.1186/s13059-016-0971-7 to call copy numbers. Option 'dnacopy' uses segment from the DNAcopy package to call copy numbers similarly to the method proposed in doi:10.1038/nmeth.3578, which gives more robust but less sensitive results compared to the HMM. Option 'edivisive' (DEFAULT) works like option 'dnacopy' but uses the e.divisive function from the ecp package for segmentation.

strandseq

A logical indicating whether the data comes from Strand-seq experiments. If TRUE, both strands carry information and are treated separately.

R

method-edivisive: The maximum number of random permutations to use in each iteration of the permutation test (see e.divisive). Increase this value to increase accuracy on the cost of speed.

sig.lvl

method-edivisive: The level at which to sequentially test if a proposed change point is statistically significant (see e.divisive). Increase this value to find more breakpoints.

eps

method-HMM: Convergence threshold for the Baum-Welch algorithm.

max.time

method-HMM: The maximum running time in seconds for the Baum-Welch algorithm. If this time is reached, the Baum-Welch will terminate after the current iteration finishes. Set max.time = -1 for no limit.

max.iter

method-HMM: The maximum number of iterations for the Baum-Welch algorithm. Set max.iter = -1 for no limit.

num.trials

method-HMM: The number of trials to find a fit where state most.frequent.state is most frequent. Each time, the HMM is seeded with different random initial values.

states

method-HMM: A subset or all of c("zero-inflation","0-somy","1-somy","2-somy","3-somy","4-somy",...). This vector defines the states that are used in the Hidden Markov Model. The order of the entries must not be changed.

confint

Desired confidence interval for breakpoints. Set confint=NULL to disable confidence interval estimation. Confidence interval estimation will force reads.store=TRUE.

refine.breakpoints

A logical indicating whether breakpoints from the HMM should be refined with read-level information. refine.breakpoints=TRUE will force reads.store=TRUE.

hotspot.bandwidth

A vector the same length as binsizes with bandwidths for breakpoint hotspot detection (see hotspotter for further details). If NULL, the bandwidth will be chosen automatically as the average distance between reads.

hotspot.pval

P-value for breakpoint hotspot detection (see hotspotter for further details). Set hotspot.pval = NULL to skip hotspot detection.

cluster.plots

A logical indicating whether plots should be clustered by similarity.

Value

NULL

Author(s)

Aaron Taudt

Examples

1
2
3
4
## Not run: 
## The following call produces plots and genome browser files for all BAM files in "my-data-folder"
Aneufinder(inputfolder="my-data-folder", outputfolder="my-output-folder")
## End(Not run)

ataudt/aneufinder documentation built on Nov. 21, 2018, 10:10 a.m.