View source: R/read_vcfs_as_granges.R
read_vcfs_as_granges | R Documentation |
This function reads Variant Call Format (VCF) files into a GRanges object and combines them in a GRangesList. In addition to loading the files, this function applies the same seqlevel style to the GRanges objects as the reference genome passed in the 'genome' parameter. By default only reads in snv variants.
read_vcfs_as_granges( vcf_files, sample_names, genome, group = c("auto+sex", "auto", "sex", "circular", "all", "none"), type = c("snv", "indel", "dbs", "mbs", "all"), change_seqnames = TRUE, predefined_dbs_mbs = FALSE, remove_duplicate_variants = TRUE )
vcf_files |
Character vector of VCF file names |
sample_names |
Character vector of sample names |
genome |
BSgenome reference genome object |
group |
Selector for a seqlevel group. All seqlevels outside of this group will be removed. Possible values: * 'all' for all chromosomes; * 'auto' for autosomal chromosomes; * 'sex' for sex chromosomes; * 'auto+sex' for autosomal + sex chromosomes (default); * 'circular' for circular chromosomes; * 'none' for no filtering, which results in keeping all seqlevels from the VCF file. |
type |
The mutation type that will be loaded. All other variants will be filtered out. Possible values: * 'snv' (default) * 'indel' * 'dbs' * 'mbs' * 'all' When you use 'all', no filtering or merging of variants is performed. |
change_seqnames |
Boolean. Whether to change the seqlevelsStyle of the vcf to that of the BSgenome object. (default = TRUE) |
predefined_dbs_mbs |
Boolean. Whether DBS and MBS variants have been predefined in your vcf. This function by default assumes that DBS and MBS variants are present in the vcf as SNVs, which are positioned next to each other. If your DBS/MBS variants are called separately you should set this argument to TRUE. (default = FALSE) |
remove_duplicate_variants |
Boolean. Whether duplicate variants are removed. This is based on genomic coordinates and does not take the alternative bases into account. It is generally recommended to keep this on. Turning this off can result in warnings in plot_rainfall. When a duplicate SNV is identified as part of a DBS, only a single DBS, instead of a duplicate DBS will be formed. (default = TRUE) |
A GRangesList containing the GRanges obtained from 'vcf_files'
## The example data set consists of three colon samples, three intestine ## samples and three liver samples. So, to map each file to its appropriate ## sample name, we create a vector containing the sample names: sample_names <- c( "colon1", "colon2", "colon3", "intestine1", "intestine2", "intestine3", "liver1", "liver2", "liver3" ) ## We assemble a list of files we want to load. These files match the ## sample names defined above. vcf_files <- list.files(system.file("extdata", package = "MutationalPatterns" ), pattern = "sample.vcf", full.names = TRUE ) ## Get a reference genome BSgenome object. ref_genome <- "BSgenome.Hsapiens.UCSC.hg19" library("BSgenome") library(ref_genome, character.only = TRUE) ## This function loads the files as GRanges objects. ## For backwards compatability reasons it only loads SNVs by default vcfs <- read_vcfs_as_granges(vcf_files, sample_names, ref_genome) ## To load all variant types use: vcfs <- read_vcfs_as_granges(vcf_files, sample_names, ref_genome, type = "all") ## Loading only indels can be done like this. ## Select data containing indels. vcf_fnames <- list.files(system.file("extdata", package = "MutationalPatterns"), pattern = "blood.*vcf", full.names = TRUE ) sample_names <- c("AC", "ACC55", "BCH") ## Read data and select only the indels. ## Other mutation types can be read in the same way. read_vcfs_as_granges(vcf_fnames, sample_names, ref_genome, type = "indel")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.