knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
While the most useful analysis of VCF variants comes from using GEMINI, often-times the analyst just wants to quickly check a VCF. SeeGEM can use the vcfR package to import a VCF and produce a SeeGEM report
This process will load the entire VCF into memory. I strongly recommend you check the file size to make sure that the VCF is a reasonable size, so as not to take down your computer. I am being vague about what reasonable is, as that depends on how much memory you have and whether the VCF is compressed (ends in *gz
) or not. As a rough rule of thumb, if the VCF is larger than 100mb, then I suggest you filter it before running through SeeGEM.
library(vcfR) library(dplyr) library(SeeGEM)
vcf <- read.vcfR(system.file("extdata/example.vcf.gz", package="SeeGEM"))
vcfR contains a handy function to convert the vcf into a tidy dataframe.
vcf_df <- vcfR2tidy(vcf)
vcfR2tidy
actually returns three tibbles (data frames):
fix
which holds all of the informationgt
which contains the genotype informationmeta
which has the information for the INFO fields contained in the VCF headerSince we want the information and the genotype, we are going to need to glue these together.
The fix
and gt
tibbles both have ChromKey
and POS
columns, so I will drop them from one when we column bind them
for_see_gem <- cbind(vcf_df$fix, vcf_df$gt %>% select(-`ChromKey`, -`POS`))
We could just dump for_see_gem
into a SeeGEM report, but that would be a bad idea. Why?
Well, let's look at how many columns and rows this has.
for_see_gem %>% dim()
r vcf_df$fix %>% nrow()
rows and r vcf_df$fix %>% ncol()
columns. This will be unpleasant to look it.
I do some light filtering (removing variants which fail the GATK filter and have a gnomAD allele frequency over 0.01) and only select the columns you are interested in before passing the data frame to knit_see_gem_from_table
to create the document.
knit_see_gem_from_table(table_data = for_see_gem %>% filter(FILTER == 'PASS', gno_af_all < 0.01) %>% select(CHROM, POS, REF, ALT, AC, Indiv, gt_GT, gt_DP, gno_af_all, ccr_pct_v1, REVEL), sample_name = 'EXAMPLE', title = 'Vignette')
To make this process a touch easier, I have written a function which wraps the vcfR2tidy
and cbind
steps into one call called vcf_to_tibble
You skip most the steps and above and get your tibble by calling this
ready_for_see_gem <- vcf_to_tibble(system.file("extdata/example.vcf.gz", package="SeeGEM"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.