read_vcfs_as_granges: Read VCF files into a GRangesList

View source: R/read_vcfs_as_granges.R

read_vcfs_as_grangesR Documentation

Read VCF files into a GRangesList

Description

This function reads Variant Call Format (VCF) files into a GRanges object and combines them in a GRangesList. In addition to loading the files, this function applies the same seqlevel style to the GRanges objects as the reference genome passed in the 'genome' parameter. By default only reads in snv variants.

Usage

read_vcfs_as_granges(
  vcf_files,
  sample_names,
  genome,
  group = c("auto+sex", "auto", "sex", "circular", "all", "none"),
  type = c("snv", "indel", "dbs", "mbs", "all"),
  change_seqnames = TRUE,
  predefined_dbs_mbs = FALSE,
  remove_duplicate_variants = TRUE
)

Arguments

vcf_files

Character vector of VCF file names

sample_names

Character vector of sample names

genome

BSgenome reference genome object

group

Selector for a seqlevel group. All seqlevels outside of this group will be removed. Possible values: * 'all' for all chromosomes; * 'auto' for autosomal chromosomes; * 'sex' for sex chromosomes; * 'auto+sex' for autosomal + sex chromosomes (default); * 'circular' for circular chromosomes; * 'none' for no filtering, which results in keeping all seqlevels from the VCF file.

type

The mutation type that will be loaded. All other variants will be filtered out. Possible values: * 'snv' (default) * 'indel' * 'dbs' * 'mbs' * 'all' When you use 'all', no filtering or merging of variants is performed.

change_seqnames

Boolean. Whether to change the seqlevelsStyle of the vcf to that of the BSgenome object. (default = TRUE)

predefined_dbs_mbs

Boolean. Whether DBS and MBS variants have been predefined in your vcf. This function by default assumes that DBS and MBS variants are present in the vcf as SNVs, which are positioned next to each other. If your DBS/MBS variants are called separately you should set this argument to TRUE. (default = FALSE)

remove_duplicate_variants

Boolean. Whether duplicate variants are removed. This is based on genomic coordinates and does not take the alternative bases into account. It is generally recommended to keep this on. Turning this off can result in warnings in plot_rainfall. When a duplicate SNV is identified as part of a DBS, only a single DBS, instead of a duplicate DBS will be formed. (default = TRUE)

Value

A GRangesList containing the GRanges obtained from 'vcf_files'

Examples

## The example data set consists of three colon samples, three intestine
## samples and three liver samples.  So, to map each file to its appropriate
## sample name, we create a vector containing the sample names:
sample_names <- c(
  "colon1", "colon2", "colon3",
  "intestine1", "intestine2", "intestine3",
  "liver1", "liver2", "liver3"
)

## We assemble a list of files we want to load.  These files match the
## sample names defined above.
vcf_files <- list.files(system.file("extdata",
  package = "MutationalPatterns"
),
pattern = "sample.vcf", full.names = TRUE
)

## Get a reference genome BSgenome object.
ref_genome <- "BSgenome.Hsapiens.UCSC.hg19"
library("BSgenome")
library(ref_genome, character.only = TRUE)

## This function loads the files as GRanges objects.
## For backwards compatability reasons it only loads SNVs by default
vcfs <- read_vcfs_as_granges(vcf_files, sample_names, ref_genome)

## To load all variant types use:
vcfs <- read_vcfs_as_granges(vcf_files, sample_names, ref_genome, type = "all")

## Loading only indels can be done like this.

## Select data containing indels.
vcf_fnames <- list.files(system.file("extdata", package = "MutationalPatterns"),
  pattern = "blood.*vcf", full.names = TRUE
)
sample_names <- c("AC", "ACC55", "BCH")

## Read data and select only the indels.
## Other mutation types can be read in the same way.
read_vcfs_as_granges(vcf_fnames, sample_names, ref_genome, type = "indel")

CuppenResearch/MutationalPatterns documentation built on Nov. 23, 2022, 4:13 a.m.