chromosomeVis: Visualization of all genomic variants on the chromosome

Description Usage Arguments Value Author(s) Examples

Description

Reads files containing single nucleotide variants (SNV) and structural genomic variants(SV) - vcf.gz files generated by speedseq aligner and variant caller. Function outputs visualization png figures. Figure illustrates variants (blue dots) in their genomic coordinates (x axis). Ratio of alternative reads and depth (y axis) gives information about type of variant: homozygous alternative (expected ratio 1) and heterozygous (expected ratio 0.5). Green dots represent rare variants that pass filters: coding/UTR, nonsynonymous variant with dbSNP frequency < 0.01 and ExAC frequency < 0.01. Orange vertical lines depict position of centromere. Orange dots depict structural and copy number variants that overlap with coding region and are relatively good quality (QUAL > 0). Red curve illustrates moving average of alternative reads/depth ratio. High values of this curve (exceeding 0.75) can suggest potential homozygous/deleterious regions. In addition, files containing table with rare SNV and SV variants only are generated. Tables include variants that passed filters specified above with annotations (uniprot, RefSeq and other). Function analyzes whole genome in about 30 minutes on a desktop computer.

Usage

1
chromosomeVis(sample, sv_sample, dbSNP_file, Exac_file, chromosomes, pngWidth, pngHeight, caller, MA_Window, coding_regions_file, annotation_file, uniprot_file)

Arguments

sample

A name of SNV sample file to be analyzed.

sv_sample

A name of additional SV sample file. If not specified, structural variants are discarded.

dbSNP_file

A file with SNPs database. If not specified, chromosome 19 dbSNP is used.

Exac_file

ExAC database file. If not specified, chromosome 19 ExAC is used.

chromosomes

A vector of strings indicating chromosomes to be analyzed.

pngWidth

A number indicating pixel width of output png files. Default is 1600.

pngHeight

A number indicating pixel height of output png files. Default is 1200.

caller

A string indicating vcf caller. Default is "speedseq", supports "GATK"

MA_Window

A number indicating window size for moving average function. Recommended value for genome is 2000, for exome is 20. Default is 1000.

coding_regions_file

A bed file indicating coding regions

annotation_file

Text file indicating positions of the genes (from UCSC)

uniprot_file

Text file indicating gene functions and related diseases (from Uniprot)

Value

comp1

function plots static visualization of genomic variants on all chromosomes, annotates them, filters and reports output variants in tables

Author(s)

Adam Gudys and Tomasz Stokowy

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# analyze chromosome 19 from example genome
sample = system.file("extdata", "CoriellIndex_S1_chr19_9-10_S1.vcf.recode.vcf.gz",
  package = "RareVariantVis")
sv_sample = system.file("extdata", "CoriellIndex_S1.sv.vcf.gz",
  package = "RareVariantVis")
chromosomeVis(sample=sample, sv_sample=sv_sample, chromosomes=c("19"))

# without sv data
# sample = system.file("extdata", "CoriellIndex_S1_chr19_9-10_S1.vcf.recode.vcf.gz",
#    package = "RareVariantVis")
# chromosomeVis(sample=sample, chromosomes=c("19"))

# analyze entire genome (use external full-genome dbSNP and ExAC)
# it takes approximately 30 mins on a desktop computer
# large example data  and all necessary hg19 references can be downloaded from:
# https://github.com/agudys/DataRareVariantVis
# dbSNP_file = "All_20160601.vcf.gz"
# Exac_file = "ExAC.r0.3.1.sites.vep.vcf.gz"
# chromosomeVis(sample=sample, sv_sample=sv_sample,
#     dbSNP_file=dbSNP_file, Exac_file=Exac_file,
#     chromosomes=c(as.character(1:22), "X", "Y"), MA_Window = 2000,
# coding_regions_file = "nexterarapidcapture_exome_targetedregions_v1.2.bed",
# annotation_file = "UCSC_hg19_refSeq_160702.txt",
# uniprot_file = "uniprot-all.txt")

tstokowy/RareVariantVis documentation built on May 17, 2019, 8:46 p.m.