workflow: Sequenza convenience functions for standard analysis
In sequenza: Copy Number Estimation from Tumor Genome Sequencing Data

Description Usage Arguments Details See Also Examples

These three functions are intended to be the main user interface of the package, to run several of the functions of sequenza in a standardized pipeline.

  sequenza.extract(file, window = 1e6, overlap = 1,
    gamma = 80, kmin = 10, gamma.pcf = 140, kmin.pcf = 40,
    mufreq.treshold = 0.10, min.reads = 40, min.reads.normal = 10,
    min.reads.baf = 1, max.mut.types = 1, min.type.freq = 0.9,
    min.fw.freq = 0, verbose = TRUE, chromosome.list = NULL,
    breaks = NULL, breaks.method = "het", assembly = "hg19",
    weighted.mean = TRUE, normalization.method = "mean",
    ignore.normal = FALSE, parallel = 1, gc.stats = NULL,
    segments.samples = FALSE)

  sequenza.fit(sequenza.extract, female = TRUE, N.ratio.filter = 10,
               N.BAF.filter = 1, segment.filter = 3e6,
               mufreq.treshold = 0.10, XY = c(X = "X", Y = "Y"),
               cellularity = seq(0.1,1,0.01), ploidy = seq(1, 7, 0.1),
               ratio.priority = FALSE, method = "baf",
               priors.table = data.frame(CN = 2, value = 2),
               chromosome.list = 1:24, mc.cores = getOption("mc.cores", 2L))

  sequenza.results(sequenza.extract, cp.table = NULL, sample.id, out.dir = getwd(),
                   cellularity = NULL, ploidy = NULL, female = TRUE, CNt.max = 20,
                   ratio.priority = FALSE, XY = c(X = "X", Y = "Y"),
                   chromosome.list = 1:24)

`file`	the name of the seqz file to read.
`window`	size of windows used when plotting mean and quartile ranges of depth ratios and B-allele frequencies. Smaller windows will take more time to compute.
`overlap`	integer specifying the number of overlapping windows.
`gamma, kmin`	arguments passed to `aspcf` from the copynumber package.
`gamma.pcf, kmin.pcf`	arguments passed to `pcf` from the copynumber package. The arguments are effective only when `breaks.method` is set to "full".
`mufreq.treshold`	mutation frequency threshold.
`min.reads`	minimum number of reads above the quality threshold to accept the mutation call.
`min.reads.normal`	minimum number of reads used to determine the genotype in the normal sample.
`min.reads.baf`	threshold on the depth of the positions included to calculate the average BAF for segment.
`max.mut.types`	maximum number of different base substitutions per position. Integer from 1 to 3 (since there are only 4 bases). Default is 3, to accept "noisy" mutation calls.
`min.type.freq`	minimum frequency of aberrant types.
`min.fw.freq`	minimum frequency of variant reads detected in the forward strand. Setting it to 0, all the variant calls with strand frequency in the interval outside 0 and 1, margin not comprised, would be discarded.
`verbose`	logical, indicating whether to print information about the chromosome being processed.
`chromosome.list`	vector containing the index or the names of the chromosome to include in the model fitting.
`breaks`	Optional data.frame in the format chrom, start.pos, end.pos, defining a pre-existing segmentation. When the argument is set the built-in segmentation will be skipped in favor of the suggested breaks.
`breaks.method`	Argument indicating the resolution of the segmentation. Possible values are `fast`, `het` and `full`, where `fast` allows the lower resolution and `full` the higher. Custom values of `gamma` and `kmin` need to be adjusted to have optimal results.
`assembly`	assembly version of the genome, see `aspcf` or `pcf`.
`weighted.mean`	boolean to select if the segments should be calculated using the read depth as weights to calculate depth ratio and B-allele frequency means.
`normalization.method`	string defining the operation to perform during the GC-normalization process. Possible values are `mean` (default) and `median`. A `median` normalization is preferable with noisy data.
`ignore.normal`	boolean, when set to TRUE the process will ignore the normal coverage and perform the analysis by using the normalized tumor coverage.
`parallel`	integer, number of threads used to process a seqz file (see `chunk.apply`).
`gc.stats`	object returned from the function `gc.sample.stats`. If `NULL` the object will be computed from the input file.
`segments.samples`	EXPERIMENTAL. Segment both tumor and normal samples separately, and add it to the QC plots.
`sequenza.extract`	a list of objects as output from the `sequenza.extract` function.
`method`	method to use to fit the data; possible values are `baf` to use `baf.model.fit` or `mufreq` to use the `mufreq.model.fit` function to fit the data.
`cp.table`	a list of objects as output from the `sequenza.fit` function.
`female`	logical, indicating whether the sample is male or female, to properly handle the X and Y chromosomes. Implementation only works for the human normal karyotype.
`CNt.max`	maximum copy number to consider in the model.
`N.ratio.filter`	threshold of minimum number of observation of depth ratio in a segment.
`N.BAF.filter`	threshold of minimum number of observation of B-allele frequency in a segment.
`segment.filter`	threshold segment length (in base pairs) to filter out short segments, that can cause noise when fitting the cellularity and ploidy parameters. The threshold will not affect the allele-specific segmentation.
`XY`	character vector of length 2 specifying the labels used for the X and Y chromosomes.
`cellularity`	vector of candidate cellularity parameters.
`ploidy`	vector candidate ploidy parameters.
`priors.table`	data frame with the columns `CN` and `value`, containing the copy numbers and the corresponding weights. To every copy number is assigned the value 1 as default, so every values different then 1 will change the corresponding weight.
`ratio.priority`	logical, if TRUE only the depth ratio will be used to determine the copy number state, while the Bf value will be used to determine the number of B-alleles.
`sample.id`	identifier of the sample, to be used as a prefix for saved objects.
`out.dir`	output directory where the files and objects will be saved.
`mc.cores`	legacy argument to set the number of cores, but it refers to the `cl` of `pblapply`. It uses `mclapply` when set to an integer.

The first function, sequenza.extract, utilizes a range of functions from the sequenza package to read the raw data, normalize the depth.ratio for GC-content bias, perform allele-specific segmentation, filter for noisy mutations and bin the raw data for plotting. The computed objects are returned as a single list object.

The segmentation by default is performed using only the heterozygous position and the aspcf function from copynumber package. The full option in the breaks.method argument allow to combine results of the segmentation of all the data available, using the pcf function, and the default aspcf using only the heterozygous positions.

The second function, sequenza.fit, accepts the output from sequenza.extract and calls baf.model.fit to calculate the log-posterior probability for all pairs of the candidate ploidy and cellularity parameters.

The third function, sequenza.results, saves a number of objects in a specified directory (default is the working directory). The objects are:

The list of segments with resulting copy numbers and major and minor alleles.
The candidate mutation list with variant allele frequency, and copy number and number of mutated allele, in relation of the clonal population (for sub-clonal population it needs to be processed with further methods).
A plot of all the chromosomes in one image, representing the major and minor alleles and the absolute copy number changes (genome_view).
Multiple plots with one chromosome per image, representing copy-number, B-allele frequency and mutation in parallel (chromosome_view).
Results of the model fitting (CP_contours and confints_CP)
A summary of the copy number state of the sample (CN_bars).

genome.view, baf.bayes, cp.plot, get.ci.

  ## Not run: 

data.file <-  system.file("extdata", "example.seqz.txt.gz",
              package = "sequenza")
test <- sequenza.extract(data.file)
test.CP   <- sequenza.fit(test)
sequenza.results(test, test.CP, out.dir = "example",
                 sample.id = "example")

   
## End(Not run)