Converts a VCF file to correct input format

Share:

Description

Given a VCF file, outputs a data frame with counts of how frequently a mutation is found within each trinucleotide context per sample ID. Output can be used as input into getTriContextFraction.

Usage

1
vcf.to.sigs.input(vcf, bsg = NULL)

Arguments

vcf

Location of the VCF file that is to be converted

bsg

Only set if another genome build is required. Must be a BSgenome object.

Details

The context sequence is taken from the BSgenome.Hsapiens.UCSC.hg19::Hsapiens object, therefore the coordinates must correspond to the human hg19 assembly, the UCSC version of the GRCh37 Homo sapiens assembly. This method will to its best to translate chromosome names from other versions of the assembly like NCBI or Ensembl. For instance, the following transformation will be done: "1" -> "chr1"; "MT" -> "chrM"; "GL000245.1" -> "chrUn_gl000245"; etc.

This method relies on the VariantAnnotation package to read the VCF file.

Value

A data frame that contains sample IDs for the rows and trinucleotide contexts for the columns. Each entry is the count of how many times a mutation with that trinucleotide context is seen in the sample.

Examples

1
2
3
4
## Not run: 
sigs.input = vcf.to.sigs.input(vcf = "variants.vcf")

## End(Not run)