Sequenza_CNAqc | R Documentation |
This pipeline implements optimised allele-specific copy number calling with Sequenza and CNAqc, optimising purity estimated from data, and allele-specific segments.
Requirements:
1. (Prerequisites) Run Sequenza utils steps outside R a. Process a FASTA file to produce a GC Wiggle track file b. Process BAM and Wiggle files to produce a seqz file: c. Post-process by binning the original seqz file:
Instructions to customize these steps are available at https://cran.r-project.org/web/packages/sequenza/vignettes/sequenza.html
2. This pipeline (Sequenza_CNAqc
) optimizes CNA calling and purity
estimations using peak detection from CNAqc. Starting from a broad set of initial conditions
for purity (5% to 100%) the pipeline fits cellularity and ploidy values and dumps results.
Results are then quality controlled by peak detection via CNAqc. The analysis with CNAqc
can be carried out using either somatic mutations called by Sequenza, or an external set
of input calls.
3. At every run the best solution by Sequenza is stored in a cache, and up to two alternative solutions are generated: one optional by Sequenza, which might propose different exact values for cellularity and ploidy; one by CNAqc, adjusting the current cellularity of the best Sequenza solution. The alternative solutions are enqueued in a list L of cellularity and ploidy values to be tested, based on their presence in the cache.
4. Until the list L is empty, the point values of cellularity and ploidy are
tested repeating step 3, using ranges around the proposed values specified by
input parameters. Further refinement steps will restrict the Sequenza purity
range based on the score computed by analyze_peaks
, which determines a purity
gradient.
5. Each run is saved in a folder named run-1
, run-2
, etc.
When the list L is empty the pipeline stops and a symbolic link final
pointing to the best Sequenza solution, based on CNAqc score is created.
Sequenza_CNAqc(
sample_id,
seqz_file,
mutations = NULL,
sex,
cellularity = c(0.05, 1),
ploidy = c(1.8, 5.4),
reference = "GRCh38",
normalization.method = "median",
window = 1e+05,
gamma = 280,
kmin = 300,
min.reads.baf = 50,
min.reads = 50,
min.reads.normal = 15,
max.mut.types = 1,
delta_cellularity = 0.05,
delta_ploidy = 0.25,
verbose = FALSE,
...
)
sample_id |
The id of the sample. |
seqz_file |
A binned |
mutations |
If not NULL (default), an external set of mutation calls, in tibble format. Must include colummns "chr", "from", "to", "NV", "NR", "VAF". |
sex |
Sex of the patient from whom the sample is drawn. |
cellularity |
Input to |
ploidy |
Input to |
reference |
Reference genome among supported by CNAqc (GRCh38 or hg19/GRCh37). |
normalization.method |
Input to |
window |
Input to |
gamma |
Input to |
kmin |
Input to |
min.reads.baf |
Input to |
min.reads |
Input to |
min.reads.normal |
Input to |
max.mut.types |
Input to |
delta_cellularity |
When a proposed solution of purity and ploidy is tested,
Sequenza is run on an interval of purity centered at the proposed value, of half
width |
delta_ploidy |
When a proposed solution of purity and ploidy is tested,
Sequenza is run on an interval of ploidy centered at the proposed value, of half
width |
verbose |
If TRUE (default FALSE) at each Sequenza fitting step, print the updated list of purity and ploidy pairs to test. |
... |
Optional parameters passed to the |
A tibble
containing for each run its number
run
, the solution values of purity
and ploidy
, the corresponding
CNAqc score
and quality control status QC
, a list sequenza
of the
input and output of the Sequenza fitting procedure, and the object of class CNAqc
created by the init()
and analyze_peaks()
functions.
If at any time a purity proposal made by CNAqc leads
to inconsistent values (i.e., outside $[0,1]$) the pipeline stops prompting to the user
to adjust manually the range of ploidy
.
## Not run:
# Install Sequenza package from https://cran.r-project.org/web/packages/sequenza/vignettes/sequenza.html
install.packages("sequenza")
# Outside R
pip install sequenza-utils
# Process a FASTA file to produce a GC Wiggle track file
sequenza−utils gc_wiggle −w 50 --fasta hg19.fa -o hg19.gc50Base.wig.gz
# Process BAM and Wiggle files to produce a seqz file
sequenza−utils bam2seqz -n normal.bam -t tumor.bam --fasta hg19.fa \
-gc hg19.gc50Base.wig.gz -o out.seqz.gz
# Post-process by binning the original seqz file:
sequenza−utils seqz_binning --seqz out.seqz.gz -w 50 -o out small.seqz.gz
# Run this pipeline
Sequenza_CNAqc(
sample_id = 'tumour',
seqz_file = 'small.seqz.gz', # Binned file
mutations = dataset$mutations, # If using an external set of mutations
sex = "F", # If female
cellularity = c(0.05, 1),
ploidy = c(1.8, 5.4),
reference = 'GRCh38',
normalization.method = 'median',
window = 1e5,
gamma = 280,
kmin = 300,
min.reads.baf = 50,
min.reads = 50,
min.reads.normal = 15,
max.mut.types = 1,
delta_cellularity = 0.05,
delta_ploidy = 0.25,
verbose = TRUE)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.