hmfGeneAnnotation: Determines gene amplifications and biallelic losses from HMF pipeline output

Identifying gene amplification and biallelic losses

hmfGeneAnnotation is an R package designed to determine the amplification/biallelic loss status of a set of genes (provided as a bed file) based on copy number and SNV/indel data generated by the HMF variant calling pipeline.

Germline SNV/indel vcf (*.annotated.vcf.gz)
Somatic SNV/indel vcf (*.purple.somatic.vcf.gz)
Copy number info per gene (*.purple.cnv.gene.tsv)
Copy number info (*.purple.cnv.somatic.tsv)

Bed file with the chromosome, start/end genome coordinates, and ENSEMBL gene IDs of the desired genes. Below are the first few lines of the default bed file.

default_bed <- read.delim(
   file=system.file('misc/cosmic_cancer_gene_census_20200225.bed',package='hmfGeneAnnotation'),
   check.names=F
)
head(default_bed)

##   #chrom    start      end hgnc_id hgnc_symbol ensembl_gene_id
## 1      1  2160134  2241558   10896         SKI ENSG00000157933
## 2      1  2487078  2496821   11912    TNFRSF14 ENSG00000157873
## 3      1  2985732  3355185   14000      PRDM16 ENSG00000142611
## 4      1  6241329  6269449   10315       RPL22 ENSG00000116251
## 5      1  6845384  7829766   18806      CAMTA1 ENSG00000171735
## 6      1 11166592 11322564    3942        MTOR ENSG00000198793

First install the package and its dependencies.

## Install dependencies
install.packages('seqminer')

## Install hmfGeneAnnotation
install.packages('devtools'); library(devtools)
install_github('https://github.com/UMCUGenetics/hmfGeneAnnotation/')

detGeneStatuses() is the main function of the package. The user may specify the path to a bed.file, but if unspecified, the one included in this package will be used. The user may also optionally specify the path to the java binary (java.path; default is the one installed on the system), as well as the path to the SnpSift jar (snpsift.path; default is the jar included at inst/dep/SnpSift.jar).

detGeneStatuses(
   out.dir='/path/to/write/output/files/', 
   hmf.pl.output.paths=c(
     germ_vcf='/path/to/annotated.vcf.gz', 
     som_vcf='/path/to/purple.somatic.vcf.gz', 
     gene_cnv='/path/to/purple.cnv.gene.tsv', 
     cnv='/path/to/purple.cnv.somatic.tsv'
   ), 
   sample.name='sample_name',

   ## Optional arguments
   bed.file='/path/to/bed/file', 
   java.path='/path/to/java/binary', 
   snpsift.path='/path/to/snpsift/jar',

   verbose=T
)

The output is a table where each row contains (1) data about copy number gains at the chromosome arm level relative to the genome ploidy, and local copy number gains relative to the chromosome arm ploidy; (2) data about losses/mutations of allele 1 and allele 2, with each variant being given an impact score from 0-5 based on ClinVar annotations (has priority) or SnpEff variant type annotations. Below is a schematic overview of the output table.

      || gene_metadata || CN_gain_info || allele_1_losses             || allele_2_losses             ||
      ||               ||              || variant_type | impact_score || variant_type | impact_score ||
------------------------------------------------------------------------------------------------------
gene1 ||               ||              ||              |              ||              |              ||
gene2 ||               ||              ||              |              ||              |              ||
 ...

Calculate ploidy for each chromosome arm
Subset gene cnv table for genes of interest
Subset germline and somatic vcfs using SnpSift for regions of genes of interest

If min copy number \< 0.3: flag as deep deletion. Assign score of 5+5
Else if max copy number \< 0.3: flag as truncation. Assign score of 5+5
Else if min minor allele ploidy \< 0.2: flag as LOH. Assign score of 5 to allele 1
Else flag as no copy number variant

Flag origin of variant (i.e. germline or somatic)
Assign score to mutations in each allele based on Clinvar or SnpEff annotations:

score | ClinVar           | Snpeff
-----------------------------------------------------------
  5   | pathogenic        | frameshift
  4   | likely_pathogenic | nonsense
  3   | VUS               | missense, splice, inframe indel
  2   | likely_benign     | other variants
  1   | benign            | other variants
  0   | no data available | other variants

If deep deletion or truncation, assign gene CNV output to both allele 1 and 2
Else make pairs of the following events: LOH, germline mut, somatic mut
Determine variant pair with the highest hit score (i.e. combined score). The order of in which biallelic event types are prioritized is described below.

biallel_event | allele1_event | allele2_event 
---------------------------------------------
CN loss       | deep deletion | deep deletion
CN loss       | truncation    | truncation
LOH+som       | LOH           | SNV/indel
LOH+germ      | LOH           | SNV/indel
som+som       | SNV/indel     | SNV/indel
germ+som      | SNV/indel     | SNV/indel

A table containing for each gene: (1) the maximum impact variant pair; and (2) the amplification data

luannnguyen/hmfGeneAnnotation documentation built on May 6, 2020, 1:07 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

luannnguyen/hmfGeneAnnotation
Determines gene amplifications and biallelic losses from HMF pipeline output

doc/README.md
In luannnguyen/hmfGeneAnnotation: Determines gene amplifications and biallelic losses from HMF pipeline output

Identifying gene amplification and biallelic losses

Getting started

Generated by HMF pipeline

Gene list

Usage

Package workflow

Pre-processing HMF pipeline outputs

Assign scores for CN loss events

Assign scores to SNV/indels

Combine monoallelic events:

Output

R Package Documentation

Browse R Packages

We want your feedback!

luannnguyen/hmfGeneAnnotation Determines gene amplifications and biallelic losses from HMF pipeline output

doc/README.md In luannnguyen/hmfGeneAnnotation: Determines gene amplifications and biallelic losses from HMF pipeline output

Identifying gene amplification and biallelic losses

Getting started

Generated by HMF pipeline

Gene list

Usage

Package workflow

Pre-processing HMF pipeline outputs

Assign scores for CN loss events

Assign scores to SNV/indels

Combine monoallelic events:

Output

R Package Documentation

Browse R Packages

We want your feedback!

luannnguyen/hmfGeneAnnotation
Determines gene amplifications and biallelic losses from HMF pipeline output

doc/README.md
In luannnguyen/hmfGeneAnnotation: Determines gene amplifications and biallelic losses from HMF pipeline output