workflow: Sequenza convenience functions for standard analysis

Description Usage Arguments Details See Also Examples

Description

These three functions are intended to be the main user interface of the package, to run several of the functions of sequenza in a standardized pipeline.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
  sequenza.extract(file, window = 1e6, overlap = 1,
    gamma = 80, kmin = 10, gamma.pcf = 140, kmin.pcf = 40,
    mufreq.treshold = 0.10, min.reads = 40, min.reads.normal = 10,
    min.reads.baf = 1, max.mut.types = 1, min.type.freq = 0.9,
    min.fw.freq = 0, verbose = TRUE, chromosome.list = NULL,
    breaks = NULL, breaks.method = "het", assembly = "hg19",
    weighted.mean = TRUE, normalization.method = "mean",
    ignore.normal = FALSE, parallel = 1, gc.stats = NULL,
    segments.samples = FALSE)

  sequenza.fit(sequenza.extract, female = TRUE, N.ratio.filter = 10,
               N.BAF.filter = 1, segment.filter = 3e6,
               mufreq.treshold = 0.10, XY = c(X = "X", Y = "Y"),
               cellularity = seq(0.1,1,0.01), ploidy = seq(1, 7, 0.1),
               ratio.priority = FALSE, method = "baf",
               priors.table = data.frame(CN = 2, value = 2),
               chromosome.list = 1:24, mc.cores = getOption("mc.cores", 2L))

  sequenza.results(sequenza.extract, cp.table = NULL, sample.id, out.dir = getwd(),
                   cellularity = NULL, ploidy = NULL, female = TRUE, CNt.max = 20,
                   ratio.priority = FALSE, XY = c(X = "X", Y = "Y"),
                   chromosome.list = 1:24)

Arguments

file

the name of the seqz file to read.

window

size of windows used when plotting mean and quartile ranges of depth ratios and B-allele frequencies. Smaller windows will take more time to compute.

overlap

integer specifying the number of overlapping windows.

gamma, kmin

arguments passed to aspcf from the copynumber package.

gamma.pcf, kmin.pcf

arguments passed to pcf from the copynumber package. The arguments are effective only when breaks.method is set to "full".

mufreq.treshold

mutation frequency threshold.

min.reads

minimum number of reads above the quality threshold to accept the mutation call.

min.reads.normal

minimum number of reads used to determine the genotype in the normal sample.

min.reads.baf

threshold on the depth of the positions included to calculate the average BAF for segment.

max.mut.types

maximum number of different base substitutions per position. Integer from 1 to 3 (since there are only 4 bases). Default is 3, to accept "noisy" mutation calls.

min.type.freq

minimum frequency of aberrant types.

min.fw.freq

minimum frequency of variant reads detected in the forward strand. Setting it to 0, all the variant calls with strand frequency in the interval outside 0 and 1, margin not comprised, would be discarded.

verbose

logical, indicating whether to print information about the chromosome being processed.

chromosome.list

vector containing the index or the names of the chromosome to include in the model fitting.

breaks

Optional data.frame in the format chrom, start.pos, end.pos, defining a pre-existing segmentation. When the argument is set the built-in segmentation will be skipped in favor of the suggested breaks.

breaks.method

Argument indicating the resolution of the segmentation. Possible values are fast, het and full, where fast allows the lower resolution and full the higher. Custom values of gamma and kmin need to be adjusted to have optimal results.

assembly

assembly version of the genome, see aspcf or pcf.

weighted.mean

boolean to select if the segments should be calculated using the read depth as weights to calculate depth ratio and B-allele frequency means.

normalization.method

string defining the operation to perform during the GC-normalization process. Possible values are mean (default) and median. A median normalization is preferable with noisy data.

ignore.normal

boolean, when set to TRUE the process will ignore the normal coverage and perform the analysis by using the normalized tumor coverage.

parallel

integer, number of threads used to process a seqz file (see chunk.apply).

gc.stats

object returned from the function gc.sample.stats. If NULL the object will be computed from the input file.

segments.samples

EXPERIMENTAL. Segment both tumor and normal samples separately, and add it to the QC plots.

sequenza.extract

a list of objects as output from the sequenza.extract function.

method

method to use to fit the data; possible values are baf to use baf.model.fit or mufreq to use the mufreq.model.fit function to fit the data.

cp.table

a list of objects as output from the sequenza.fit function.

female

logical, indicating whether the sample is male or female, to properly handle the X and Y chromosomes. Implementation only works for the human normal karyotype.

CNt.max

maximum copy number to consider in the model.

N.ratio.filter

threshold of minimum number of observation of depth ratio in a segment.

N.BAF.filter

threshold of minimum number of observation of B-allele frequency in a segment.

segment.filter

threshold segment length (in base pairs) to filter out short segments, that can cause noise when fitting the cellularity and ploidy parameters. The threshold will not affect the allele-specific segmentation.

XY

character vector of length 2 specifying the labels used for the X and Y chromosomes.

cellularity

vector of candidate cellularity parameters.

ploidy

vector candidate ploidy parameters.

priors.table

data frame with the columns CN and value, containing the copy numbers and the corresponding weights. To every copy number is assigned the value 1 as default, so every values different then 1 will change the corresponding weight.

ratio.priority

logical, if TRUE only the depth ratio will be used to determine the copy number state, while the Bf value will be used to determine the number of B-alleles.

sample.id

identifier of the sample, to be used as a prefix for saved objects.

out.dir

output directory where the files and objects will be saved.

mc.cores

legacy argument to set the number of cores, but it refers to the cl of pblapply. It uses mclapply when set to an integer.

Details

The first function, sequenza.extract, utilizes a range of functions from the sequenza package to read the raw data, normalize the depth.ratio for GC-content bias, perform allele-specific segmentation, filter for noisy mutations and bin the raw data for plotting. The computed objects are returned as a single list object.

The segmentation by default is performed using only the heterozygous position and the aspcf function from copynumber package. The full option in the breaks.method argument allow to combine results of the segmentation of all the data available, using the pcf function, and the default aspcf using only the heterozygous positions.

The second function, sequenza.fit, accepts the output from sequenza.extract and calls baf.model.fit to calculate the log-posterior probability for all pairs of the candidate ploidy and cellularity parameters.

The third function, sequenza.results, saves a number of objects in a specified directory (default is the working directory). The objects are:

See Also

genome.view, baf.bayes, cp.plot, get.ci.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
  ## Not run: 

data.file <-  system.file("extdata", "example.seqz.txt.gz",
              package = "sequenza")
test <- sequenza.extract(data.file)
test.CP   <- sequenza.fit(test)
sequenza.results(test, test.CP, out.dir = "example",
                 sample.id = "example")

   
## End(Not run)

sequenza documentation built on May 9, 2019, 5:04 p.m.