View source: R/runAbsoluteCN.R
runAbsoluteCN | R Documentation |
This function takes as input tumor and normal control coverage data and a VCF containing allelic fractions of germline variants and somatic mutations. Normal control does not need to be from the same patient. In case VCF does not contain somatic status, it should contain either dbSNP or population allele frequencies, and optionally COSMIC annotation. Returns purity and ploidy combinations, sorted by likelihood score. Provides copy number and LOH data, by both gene and genomic region.
runAbsoluteCN(
normal.coverage.file = NULL,
tumor.coverage.file = NULL,
log.ratio = NULL,
seg.file = NULL,
seg.file.sdev = 0.4,
vcf.file = NULL,
normalDB = NULL,
genome,
centromeres = NULL,
sex = c("?", "F", "M", "diploid"),
fun.filterVcf = filterVcfMuTect,
args.filterVcf = list(),
fun.setPriorVcf = setPriorVcf,
args.setPriorVcf = list(),
fun.setMappingBiasVcf = setMappingBiasVcf,
args.setMappingBiasVcf = list(),
fun.filterIntervals = filterIntervals,
args.filterIntervals = list(),
fun.segmentation = segmentationCBS,
args.segmentation = list(),
fun.focal = findFocal,
args.focal = list(),
sampleid = NULL,
min.ploidy = 1.4,
max.ploidy = 6,
test.num.copy = 0:7,
test.purity = seq(0.15, 0.95, by = 0.01),
prior.purity = NULL,
prior.K = 0.999,
prior.contamination = 0.01,
max.candidate.solutions = 20,
candidates = NULL,
min.coverage = 15,
max.coverage.vcf = 300,
max.non.clonal = 0.2,
max.homozygous.loss = c(0.05, 1e+07),
non.clonal.M = 1/3,
max.mapping.bias = 0.8,
max.pon = 3,
iterations = 30,
min.variants.segment = 5,
log.ratio.calibration = 0.1,
smooth.log.ratio = TRUE,
model.homozygous = FALSE,
error = 0.001,
interval.file = NULL,
max.dropout = c(0.95, 1.1),
min.logr.sdev = 0.15,
max.logr.sdev = 0.6,
max.segments = 300,
min.gof = 0.8,
min.variants = 20,
plot.cnv = TRUE,
vcf.field.prefix = "",
cosmic.vcf.file = NULL,
DB.info.flag = "DB",
POPAF.info.field = "POP_AF",
Cosmic.CNT.info.field = "Cosmic.CNT",
min.pop.af = 0.001,
model = c("beta", "betabin"),
post.optimize = FALSE,
speedup.heuristics = 2,
BPPARAM = NULL,
log.file = NULL,
verbose = TRUE
)
normal.coverage.file |
Coverage file of normal control (optional
if log.ratio is provided - then it will be only used to filter low coverage
exons). Should be already GC-normalized with
|
tumor.coverage.file |
Coverage file of tumor. If |
log.ratio |
Copy number log-ratios for all exons in the coverage files.
If |
seg.file |
Segmented data. Optional, to support third-pary
segmentation tools. If |
seg.file.sdev |
If |
vcf.file |
VCF file.
Optional, but typically needed to select between local optima of similar
likelihood. Can also be a |
normalDB |
Normal database, created with
|
genome |
Genome version, for example hg19. See |
centromeres |
A |
sex |
Sex of sample. If |
fun.filterVcf |
Function for filtering variants. Expected output is a
list with elements |
args.filterVcf |
Arguments for variant filtering function. Arguments
|
fun.setPriorVcf |
Function to set prior for somatic status for each
variant in the VCF. Defaults to |
args.setPriorVcf |
Arguments for somatic prior function. |
fun.setMappingBiasVcf |
Function to set mapping bias for each variant
in the VCF. Defaults to |
args.setMappingBiasVcf |
Arguments for mapping bias function. |
fun.filterIntervals |
Function for filtering low-quality intervals in the
coverage files. Needs to return a |
args.filterIntervals |
Arguments for target filtering function. Arguments
|
fun.segmentation |
Function for segmenting the copy number log-ratios.
Expected return value is a |
args.segmentation |
Arguments for segmentation function. Arguments
|
fun.focal |
Function for identifying focal amplifications. Defaults to
|
args.focal |
Arguments for focal amplification function. |
sampleid |
Sample id, provided in output files etc. |
min.ploidy |
Minimum ploidy to be considered. |
max.ploidy |
Maximum ploidy to be considered. |
test.num.copy |
Copy numbers tested in the grid search. Note that focal amplifications can have much higher copy numbers, but they will be labeled as subclonal (because they do not fit the integer copy numbers). |
test.purity |
Considered tumor purity values. |
prior.purity |
|
prior.K |
This defines the prior probability that the multiplicity of a SNV corresponds to either the maternal or the paternal copy number (for somatic variants additionally to a multiplicity of 1). For perfect segmentations, this value would be 1; values smaller than 1 thus may provide some robustness against segmentation errors. |
prior.contamination |
The prior probability that a known SNP is from a different individual. |
max.candidate.solutions |
Number of local optima considered in optimization and variant fitting steps. If there are too many local optima, it will use specified number of top candidate solutions, but will also include all optima close to diploid, because silent genomes have often lots of local optima. |
candidates |
Candidates to optimize from a previous run
( |
min.coverage |
Minimum coverage in both normal and tumor. Intervals and
variants with lower coverage are ignored. This value is provided to the
|
max.coverage.vcf |
This will set the maximum number of reads in the SNV
fitting. This is to avoid that small non-reference biases that come
apparent only at high coverages have a dramatic influence on likelihood
scores. Only relevant for |
max.non.clonal |
Maximum genomic fraction assigned to a subclonal copy number state. |
max.homozygous.loss |
|
non.clonal.M |
Average expected cellular fraction of sub-clonal somatic mutations. This is to calculate expected allelic fractions of a single sub-clonal bin for variants. For all somatic variants, more accurate cellular fractions are calculated. |
max.mapping.bias |
Exclude variants with high mapping bias from the likelihood score calculation. Note that bias is reported on an inverse scale; a variant with mapping bias of 1 has no bias. |
max.pon |
Exclude variants found more than |
iterations |
Maximum number of iterations in the Simulated Annealing copy number fit optimization. Note that this an integer optimization problem that should converge quickly. Allowed range is 10 to 250. |
min.variants.segment |
Flag segments with fewer variants. The minor copy number estimation is not reliable with insufficient variants. |
log.ratio.calibration |
Re-calibrate log-ratios in the window
|
smooth.log.ratio |
Smooth |
model.homozygous |
Homozygous germline SNPs are uninformative and by
default removed. In 100 percent pure samples such as cell lines, however,
heterozygous germline SNPs appear homozygous in case of LOH. Setting this
parameter to |
error |
Estimated sequencing error rate. Used to calculate minimum
number of supporting reads for variants using
|
interval.file |
A mapping file that assigns GC content and gene symbols
to each exon in the coverage files. Used for generating gene-level calls.
First column in format CHR:START-END. Second column GC content (0 to 1).
Third column gene symbol. This file is generated with the
|
max.dropout |
Measures GC bias as ratio of coverage in AT-rich (GC <
0.5) versus GC-rich on-target regions (GC >= 0.5). High coverage drop-out might
indicate that data was not GC-normalized (optional with larger pool of
normal samples). A warning pointing to a normalized log-ratio drop-out likely
indicates that the sample quality is insufficient. For log-ratio drop-out,
a warning is thrown when half the |
min.logr.sdev |
Minimum log-ratio standard deviation used in the model. Useful to make fitting more robust to outliers in very clean data. |
max.logr.sdev |
Flag noisy samples with segment log-ratio standard deviation larger than this. Assay specific and needs to be calibrated. |
max.segments |
Flag noisy samples with a large number of segments. Assay specific and needs to be calibrated. |
min.gof |
Flag purity/ploidy solutions with poor fit. |
min.variants |
Do not attempt to fit allelic fractions for samples with fewer variants passing all filters. |
plot.cnv |
Generate segmentation plots. |
vcf.field.prefix |
Prefix all newly created VCF field names with this string. |
cosmic.vcf.file |
Add a |
DB.info.flag |
Flag in INFO of VCF that marks presence in common
germline databases. Defaults to |
POPAF.info.field |
As alternative to a flag, use an info field that
contains population allele frequencies. The |
Cosmic.CNT.info.field |
Info field containing hits in the Cosmic database |
min.pop.af |
Minimum population allele frequency in
|
model |
Use either a beta or a beta-binomial distribution for fitting
observed to expected allelic fractions of alterations in |
post.optimize |
Optimize purity using final SCNA-fit and variants. This might take a long time when lots of variants need to be fitted, but will typically result in a slightly more accurate purity, especially for rather silent genomes or very low purities. Otherwise, it will just use the purity determined via the SCNA-fit. |
speedup.heuristics |
Tries to avoid spending computation time on local optima that are unlikely correct. Set to 0 to turn this off, to 1 to only apply heuristics that in worst case will decrease accuracy slightly or to 2 to turn on all heuristics. |
BPPARAM |
|
log.file |
If not |
verbose |
Verbose output. |
A list with elements
candidates |
Results of the grid search. |
results |
All local optima, sorted by final rank. |
input |
The input data. |
Markus Riester
Riester et al. (2016). PureCN: Copy number calling and SNV classification using targeted short read sequencing. Source Code for Biology and Medicine, 11, pp. 13.
Carter et al. (2012), Absolute quantification of somatic DNA alterations in human cancer. Nature Biotechnology.
correctCoverageBias
segmentationCBS
calculatePowerDetectSomatic
normal.coverage.file <- system.file('extdata', 'example_normal_tiny.txt',
package = 'PureCN')
tumor.coverage.file <- system.file('extdata', 'example_tumor_tiny.txt',
package = 'PureCN')
vcf.file <- system.file('extdata', 'example.vcf.gz',
package = 'PureCN')
interval.file <- system.file('extdata', 'example_intervals_tiny.txt',
package = 'PureCN')
# The max.candidate.solutions, max.ploidy and test.purity parameters are set to
# non-default values to speed-up this example. This is not a good idea for real
# samples.
ret <-runAbsoluteCN(normal.coverage.file = normal.coverage.file,
tumor.coverage.file = tumor.coverage.file, genome = 'hg19',
vcf.file = vcf.file, sampleid = 'Sample1',
interval.file = interval.file, max.ploidy = 4,
test.purity = seq(0.3, 0.7, by = 0.05), max.candidate.solutions = 1)
# If a high-quality segmentation was obtained with third-party tools:
seg.file <- system.file('extdata', 'example_seg.txt',
package = 'PureCN')
# By default, PureCN will re-segment the data, for example to identify
# regions of copy number neutral LOH. If this is not wanted, we can provide
# a minimal segmentation function which just returns the provided one:
funSeg <- function(seg, ...) return(seg)
res <- runAbsoluteCN(seg.file = seg.file, fun.segmentation = funSeg,
max.ploidy = 4, test.purity = seq(0.3, 0.7, by = 0.05),
max.candidate.solutions = 1,
genome='hg19', interval.file = interval.file)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.