hmfGeneAnnotation is an R package designed to determine the amplification/biallelic loss status of a set of genes (provided as a bed file) based on copy number and SNV/indel data generated by the HMF variant calling pipeline.
default_bed <- read.delim(
file=system.file('misc/cosmic_cancer_gene_census_20200225.bed',package='hmfGeneAnnotation'),
check.names=F
)
head(default_bed)
## #chrom start end hgnc_id hgnc_symbol ensembl_gene_id
## 1 1 2160134 2241558 10896 SKI ENSG00000157933
## 2 1 2487078 2496821 11912 TNFRSF14 ENSG00000157873
## 3 1 2985732 3355185 14000 PRDM16 ENSG00000142611
## 4 1 6241329 6269449 10315 RPL22 ENSG00000116251
## 5 1 6845384 7829766 18806 CAMTA1 ENSG00000171735
## 6 1 11166592 11322564 3942 MTOR ENSG00000198793
First install the package and its dependencies.
## Install dependencies
install.packages('seqminer')
## Install hmfGeneAnnotation
install.packages('devtools'); library(devtools)
install_github('https://github.com/UMCUGenetics/hmfGeneAnnotation/')
detGeneStatuses()
is the main function of the package. The user may
specify the path to a bed.file
, but if unspecified, the one included
in this package will be used. The user may also optionally specify the
path to the java binary (java.path
; default is the one installed on
the system), as well as the path to the SnpSift jar (snpsift.path
;
default is the jar included at inst/dep/SnpSift.jar
).
detGeneStatuses(
out.dir='/path/to/write/output/files/',
hmf.pl.output.paths=c(
germ_vcf='/path/to/annotated.vcf.gz',
som_vcf='/path/to/purple.somatic.vcf.gz',
gene_cnv='/path/to/purple.cnv.gene.tsv',
cnv='/path/to/purple.cnv.somatic.tsv'
),
sample.name='sample_name',
## Optional arguments
bed.file='/path/to/bed/file',
java.path='/path/to/java/binary',
snpsift.path='/path/to/snpsift/jar',
verbose=T
)
The output is a table where each row contains (1) data about copy number gains at the chromosome arm level relative to the genome ploidy, and local copy number gains relative to the chromosome arm ploidy; (2) data about losses/mutations of allele 1 and allele 2, with each variant being given an impact score from 0-5 based on ClinVar annotations (has priority) or SnpEff variant type annotations. Below is a schematic overview of the output table.
|| gene_metadata || CN_gain_info || allele_1_losses || allele_2_losses ||
|| || || variant_type | impact_score || variant_type | impact_score ||
------------------------------------------------------------------------------------------------------
gene1 || || || | || | ||
gene2 || || || | || | ||
...
score | ClinVar | Snpeff
-----------------------------------------------------------
5 | pathogenic | frameshift
4 | likely_pathogenic | nonsense
3 | VUS | missense, splice, inframe indel
2 | likely_benign | other variants
1 | benign | other variants
0 | no data available | other variants
biallel_event | allele1_event | allele2_event
---------------------------------------------
CN loss | deep deletion | deep deletion
CN loss | truncation | truncation
LOH+som | LOH | SNV/indel
LOH+germ | LOH | SNV/indel
som+som | SNV/indel | SNV/indel
germ+som | SNV/indel | SNV/indel
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.