extractSigsIndel: Extract indel signatures

View source: R/extractSigsIndel.R

extractSigsIndelR Documentation

Extract indel signatures

Description

Extract indel signatures

Usage

extractSigsIndel(..., method = "CHORD")

extractSigsIndelPcawg(
  vcf.file = NULL,
  df = NULL,
  output = "contexts",
  sample.name = NULL,
  ref.genome = DEFAULT_GENOME,
  signature.profiles = INDEL_SIGNATURE_PROFILES,
  verbose = F,
  ...
)

extractSigsIndelChord(
  vcf.file = NULL,
  df = NULL,
  sample.name = NULL,
  ref.genome = DEFAULT_GENOME,
  output = "contexts",
  indel.len.cap = 5,
  n.bases.mh.cap = 5,
  get.other.indel.allele = F,
  keep.indel.types = c("del", "ins"),
  verbose = F,
  ...
)

Arguments

...

Other arguments that can be passed to variantsFromVcf()

method

Can be 'CHORD' or 'PCAWG'. Indicates the indel context type to extract.

vcf.file

Path to the vcf file

df

A dataframe containing the columns: chrom, pos, ref, alt. Alternative input option to vcf.file

output

Output the absolute signature contributions (default, 'signatures'), indel contexts ('contexts'), or an annotated bed-like dataframe ('df')

sample.name

If a character is provided, the header for the output matrix will be named to this. If none is provided, the basename of the vcf file will be used.

ref.genome

A BSgenome reference genome. Default is BSgenome.Hsapiens.UCSC.hg19. If another reference genome is indicated, it will also need to be installed.

verbose

Print progress messages?

indel.len.cap

Specifies the max indel sequence length to consider when counting 'repeat' and 'none' contexts. Counts of longer indels will simply be binned to the counts of contexts at the max indel sequence length.

n.bases.mh.cap

Specifies the max bases in microhomology to consider when counting repeat and microhomology contexts. Counts of longer indels will simply be binned to the counts of contexts at the max indel sequence length.

get.other.indel.allele

Only applies when mode=='indel' For indels, some vcfs only report the sequence of one allele (REF for deletions and ALT for insertions). If TRUE, the unreported allele will be retrieved from the genome: a 5' base relative to the indel sequence. This base will also be added to the indel sequence and the POS will be adjusted accordingly (POS=POS-1).

keep.indel.types

A character vector of indel types to keep. Defaults to 'del' and 'ins' to filter out MNVs (variants where REF and ALT length >= 2). MNV names are: 'mnv_neutral' (REF lenth == ALT length), 'mnv_del' (REF length > ALT length), or 'mnv_ins' (REF length < ALT length).

description

Will return a 1-column matrix containing the absolute indel signature contributions (i.e. the number of mutations contributing to each mutational signature).

Two sets of indel contexts can be used: CHORD and PCAWG.

For CHORD indel contexts, signatures used are insertions/deletions within repeat regions (ins.rep, del.rep), insertions/deletions with flanking microhomology (ins.mh, del.mh), and insertions/deletions which don't fall under the previous 2 categories (ins.none, del.none). Each category is further stratified by the length of the indel.

PCAWG indel contexts are described at: https://cancer.sanger.ac.uk/cosmic/signatures/ID/index.tt

Value

A 1-column matrix containing the context counts or signature contributions


UMCUGenetics/mutSigExtractor documentation built on Aug. 30, 2024, 2:12 p.m.