readGenomesFromVCF: Read tumor genomes from a VCF file (Variant Call Format).

View source: R/readGenomesFromVCF.R

readGenomesFromVCFR Documentation

Read tumor genomes from a VCF file (Variant Call Format).

Description

'readGenomesFromVCF()' reads somatic mutations of a single tumor genome (sample) or a set of genomes from a VCF file (Variant Call Format) and determines the mutation frequencies according to a specific model of mutational signatures (Alexandrov or Shiraishi).

Usage

readGenomesFromVCF(file, numBases=5, type="Shiraishi", trDir=TRUE,
enforceUniqueTrDir=TRUE,
refGenome=BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19,
transcriptAnno=
TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene,
verbose=TRUE)

Arguments

file

(Mandatory) The name of the VCF file (can be compressed with gzip).

numBases

(Mandatory) Total number of bases (mutated base and flanking bases) to be used for sequence patterns. Must be odd. Default: 5

type

(Mandatory) Signature model or type ("Alexandrov" or "Shiraishi"). Default: "Shiraishi"

trDir

(Mandatory) Specifies whether the transcription direction is taken into account in the signature model. If so, only mutations within genomic regions with a defined transcription direction can be considered. Default: TRUE

enforceUniqueTrDir

(Optional) Used only if trDir is TRUE. If enforceUniqueTrDir is TRUE (default), then mutations which map to a region with multiple overlapping genes with opposing transcription directions will be excluded from the analysis. If FALSE, the transcript direction encountered first in the transcript database (see transcriptAnno) is assigned to the mutation. The latter was the behavior until version 1.3.5 of decompTumor2Sig and is also the behavior of pmsignature. However, it is preferable to exclude these mutations from the count (default) because from mutation data alone it cannot be inferred which of the two genes has the higher transcriptional activity which might potentially be linked to the occurrence of the mutation. (If you are unsure, use the default setting; this option exists mostly for backward compatibility with older versions.)

refGenome

(Mandatory) The reference genome (BSgenome) needed to extract sequence patterns. Default: BSgenome object for hg19.

transcriptAnno

(Optional) Transcript annotation (TxDb object) used to determine the transcription direction. This is required only if trDir is TRUE. Default: TxDb object for hg19.

verbose

(Optional) Print information about reading and processing the mutation data. Default: TRUE

Value

A list containing the genomes in terms of frequencies of the mutated sequence patterns. This list of genomes can be used for decomposeTumorGenomes.

Author(s)

Rosario M. Piro
Politecnico di Milano
Maintainer: Rosario M. Piro
E-Mail: <rmpiro@gmail.com> or <rosariomichael.piro@polimi.it>

References

http://rmpiro.net/decompTumor2Sig/
Krueger, Piro (2019) decompTumor2Sig: Identification of mutational signatures active in individual tumors. BMC Bioinformatics 20(Suppl 4):152.

See Also

decompTumor2Sig
decomposeTumorGenomes
readGenomesFromMPF
getGenomesFromMutFeatData

Examples


### load reference genome and transcript annotation (if direction is needed)
refGenome <- BSgenome.Hsapiens.UCSC.hg19::BSgenome.Hsapiens.UCSC.hg19
transcriptAnno <-
  TxDb.Hsapiens.UCSC.hg19.knownGene::TxDb.Hsapiens.UCSC.hg19.knownGene

### read breast cancer genomes from Nik-Zainal et al (PMID: 22608084) 
gfile <- system.file("extdata",
         "Nik-Zainal_PMID_22608084-VCF-convertedfromMPF.vcf.gz", 
         package="decompTumor2Sig")
genomes <- readGenomesFromVCF(gfile, numBases=5, type="Shiraishi",
         trDir=TRUE, enforceUniqueTrDir=TRUE, refGenome=refGenome,
         transcriptAnno=transcriptAnno, verbose=FALSE)


rmpiro/decompTumor2Sig documentation built on May 15, 2022, 3:27 a.m.