hmfGeneAnnotation is an R package designed to determine the amplification/biallelic loss status of a set of genes (provided as a bed file) based on copy number and SNV/indel data generated by the HMF variant calling pipeline.

Getting started

Generated by HMF pipeline

Gene list

default_bed <- read.delim(
   file=system.file('misc/cosmic_cancer_gene_census_20200225.bed',package='hmfGeneAnnotation'),
   check.names=F
)
head(default_bed)

Usage

First install the package and its dependencies.

## Install dependencies
install.packages('seqminer')

## Install hmfGeneAnnotation
install.packages('devtools'); library(devtools)
install_github('https://github.com/UMCUGenetics/hmfGeneAnnotation/')

detGeneStatuses() is the main function of the package. The user may specify the path to a bed.file, but if unspecified, the one included in this package will be used. The user may also optionally specify the path to the java binary (java.path; default is the one installed on the system), as well as the path to the SnpSift jar (snpsift.path; default is the jar included at inst/dep/SnpSift.jar).

detGeneStatuses(
   out.dir='/path/to/write/output/files/', 
   hmf.pl.output.paths=c(
     germ_vcf='/path/to/annotated.vcf.gz', 
     som_vcf='/path/to/purple.somatic.vcf.gz', 
     gene_cnv='/path/to/purple.cnv.gene.tsv', 
     cnv='/path/to/purple.cnv.somatic.tsv'
   ), 
   sample.name='sample_name',

   ## Optional arguments
   bed.file='/path/to/bed/file', 
   java.path='/path/to/java/binary', 
   snpsift.path='/path/to/snpsift/jar',

   verbose=T
)

The output is a table where each row contains (1) data about copy number gains at the chromosome arm level relative to the genome ploidy, and local copy number gains relative to the chromosome arm ploidy; (2) data about losses/mutations of allele 1 and allele 2, with each variant being given an impact score from 0-5 based on ClinVar annotations (has priority) or SnpEff variant type annotations. Below is a schematic overview of the output table.

      || gene_metadata || CN_gain_info || allele_1_losses             || allele_2_losses             ||
      ||               ||              || variant_type | impact_score || variant_type | impact_score ||
------------------------------------------------------------------------------------------------------
gene1 ||               ||              ||              |              ||              |              ||
gene2 ||               ||              ||              |              ||              |              ||
 ...

Package workflow

Pre-processing HMF pipeline outputs

Assign scores for CN loss events

Assign scores to SNV/indels

score | ClinVar           | Snpeff
-----------------------------------------------------------
  5   | pathogenic        | frameshift
  4   | likely_pathogenic | nonsense
  3   | VUS               | missense, splice, inframe indel
  2   | likely_benign     | other variants
  1   | benign            | other variants
  0   | no data available | other variants

Combine monoallelic events:

biallel_event | allele1_event | allele2_event 
---------------------------------------------
CN loss       | deep deletion | deep deletion
CN loss       | truncation    | truncation
LOH+som       | LOH           | SNV/indel
LOH+germ      | LOH           | SNV/indel
som+som       | SNV/indel     | SNV/indel
germ+som      | SNV/indel     | SNV/indel

Output

pkg_dir  <- '/Users/lnguyen/hpc/cog_bioinf/cuppen/project_data/Luan_projects/CHORD/scripts_main/hmfGeneAnnotation/'
file.copy(
   paste0(pkg_dir,'/doc/README.md'),
   paste0(pkg_dir,'/README.md'),
   overwrite=T
)
##file.remove(paste0(pkg_dir,'/doc/README.md'))


luannnguyen/hmfGeneAnnotation documentation built on May 6, 2020, 1:07 p.m.