oldRuntimeOptions: VEPParam runtime options

oldRuntimeOptionsR Documentation

VEPParam runtime options

Description

Runtime options for the most current API version of the Ensembl Variant Effect Predictor.

Details

VEPParam objects store the runtime options for querying the Ensembl Variant Effect Predictor (VEP). This page describes only the most current runtime options and is a condensed version of what is listed on the Ensembl web site:

http://uswest.ensembl.org/info/docs/tools/vep/script/vep_options.html

Runtime options for archived versions can be found on the corresponding archive page.

http://useast.ensembl.org/info/website/archives/index.html

Runtime options:

Data in the VEPParam are organized into the following categories, ‘basic’, ‘input’, ‘cache’, ‘output’, ‘identifier’, ‘colocatedVariants’, ‘dataformat’, ‘filterqc’, ‘database’ and ‘advanced’. Each category is a list of runtime options. logical options are turned on/off with TRUE/FALSE. character and numeric are ‘on’ when a character string is provided and ‘off’ when they contain an empty value (i.e., character() or numeric().

‘identifier’, ‘colocatedVariants’, ‘dataformat’ are supported for VEPParam73 and later.

basic

list of the following options:

  • verbose: logical, default FALSE; output status messages

  • quiet: logical, default FALSE; suppress status/warnings

  • no_progress: logical, default FALSE; don't show progress bars

  • config: character, default character(); name of config file

  • everything: logical, default FALSE; shortcut to switch on 12 options (sift, polyphen, ccds, hgvs, hgnc, numbers, domains, regulatory, cell_type, canonical, protein and gmaf).

  • fork: numeric, default numeric(); enable forking

input

list of the the following options:

  • species: character, default 'homo_sapiens'; species for the data

  • assembly: character, default character(); select assembly version if more than one available

  • format: character, default character(); one of the following input file formats, 'ensembl', 'vcf', 'pileup', 'hgvs', 'id' or 'vep'. By default the script auto-detects the input file format.

  • output_file: character, default writes to temp file; path and file name of output file

  • force_overwrite: logical, default FALSE; overwrite the output file if it currently exists

  • stats_file: character, default character(); summary stats file name

  • no_stats: logical, default FALSE; do not generate a stats file

  • stats_text: logical, default FALSE; generate a plain text stats file instead of html

  • html: logical, default FALSE; generate html version of the output file

cache

list of the following options:

  • cache: logical, default FALSE; enable use of cache

  • dir: character, default '$HOME/.vep/'; cache/plugin to be used

  • dir_cache: character, default '$HOME/.vep/'; cache to be used

  • dir_plugins: character, default '$HOME/.vep/'; plugin to be used

  • offline: logical, default FALSE; enable offline mode, no database connections will be made

  • fasta: character, default character(); FASTA filename or directory to files to use for reference sequences

  • cache_version: character, default character(); use a different cache version than the assumed default

  • show_cache_info: logical, default FALSE; show source version information for selected cache and quit

output

list of the following options:

  • variant_class: logical, default FALSE; output the sequence ontology variant class

  • sift: character, default character(); output prediction, score or both, valid strings are 'p', 's' or 'b'

  • polyphen: character, default character(); output prediction, score or both, valid strings are 'p', 's' or 'b'

  • humdiv: logical, default FALSE; retrieve the humDiv PolyPhen prediction instead of humVar

  • gene_phenotype: logical, default FALSE; indicates if overlapped gene is associated with a phenotype, disease or trait

  • regulatory: logical, default FALSE; identify overlaps with regulatory regions

  • cell_type: character, default character(); only report regulatory regions found in the given cell type(s)

  • custom: character, default character(); name of custom annotation file to add to output. Currently only a single annotation is supported.

  • plugin: character, default character(); name of plugin module. Currently only a single module is supported.

  • individual: character, default character(); consider only alternate alleles present in the genotypes of 'all' or a character vector of specified individuals

  • phased: logical, default FALSE; force VCF genotypes to be interpreted as phased

  • allele_number: logical, default FALSE; identify allele number from VCF input (1=first ALT, 2=second ALT, etc.)

  • total_length: character, default character(); cDNA, CDS and protein positions as position/length

  • numbers: logical, default FALSE; output affectd exon and intron numbering, format is Number/Total

  • domains: logical, default FALSE; output names of overlapping protein domains

  • no_escape: logical, default FALSE; don't URI escape HGVS string

  • keep_csq: logical, default FALSE; don't overwrite existing CSQ entry in VCF INFO field

  • vcf_info_field: character, default CSQ; change the name of the INFO key that VEP writes the consequences to in the VCF output.

  • terms: character, default 'so'; type of consequence terms to output, valid strings are 'ensembl' or 'so'

identifiers

list of the following options:

  • hgvs: logical, default FALSE; add hgvs ID's

  • shift_hgvs: [0/1], default 1 (shift); enable or disable 3' shifting of HGVS notations

  • protein: logical, default FALSE; add Ensembl protein ID's

  • symbol: logical, default FALSE; add gene symbol (e.g. HGNC) (where available) to the output

  • ccds: logical, default FALSE; add CCDS transcript ID's

  • uniprot: logical, default FALSE; adds identifiers for translated protein products from three UniProt-related databases

  • tsl: logical, default FALSE; adds the transcript support level for this transcript

  • canonical: logical, default FALSE; indicate if transcript is cononical transcript for the gene

  • biotype: logical, default FALSE; add biotype of transcript

  • xref_seq: logical, default FALSE; output aligned refseq mRNA ID

colocatedVariants

list of the following options:

  • check_existing: logical, default FALSE; check for co-located variants

  • check_alleles: logical, default FALSE; when checking for co-located variants only report them if none of the alleles supplied are novel

  • check_svs: logical, default FALSE; check for structural variants that overlap the input variants

  • gmaf: logical, default FALSE; add global minor allele frequence (MAF) from 1000 Genomes Phase 1 data

  • maf_1kg: logical, default FALSE; add MAF from continental populations of 1000 Genomes Phase 1 data; must be use with –cache

  • maf_esp: logical, default FALSE; add MAF from NHLBI-ESP populations; must be used with –cache

  • old_maf: logical, default FALSE; for maf_1kg and maf_esp report only the frequency (no allele) and convert so it is always a minor frequency, i.e. < 0.5

  • pubmed: logical, default FALSE; report Pubmed IDs for publications that cite existing variant; must be used with –cache

  • failed: logical, default FALSE; when checking for co-located variants include or exclude variants that have been flagged as failed

dataformat

list of the following options:

  • vcf: logical, default FALSE; write output in vcf format

  • json: logical, default FALSE; write output in json format

  • gvf: logical, default FALSE; write output in gcf format

  • fields: character, default fields are 'Uploaded_variation', 'Location', 'Allele', 'Gene', 'Feature', 'Feature_type', 'Consequence', 'cDNA_position', 'CDS_position', 'Protein_position', 'Amino_acids', 'Codons' and 'Extra'. See http://www.ensembl.org/info/docs/variation/vep/vep_formats.html#sv for details.

  • convert: character, default character(); converts input file to one of 'ensembl', 'vcf', or 'pileup'

  • minimal: logical, default FALSE; convert alleles to their most minimal representation before consequence calculation

filterqc

list of the following options:

  • check_ref: logical, default FALSE; force check of supplied reference allele against the sequence stored in Ensembl Core database

  • coding_only: logical, default FALSE; return consequences in coding regions only

  • chr: character, default character(); select a subset of chromosomes to be analyzed

  • no_intergenic: logical, default FALSE; do not include intergenic consequences

  • pick: logical, default FALSE; pick once line of consequence data per variant

  • pick_allele: logical, default FALSE; pick once line of consequence data per variant allele

  • flag_pick: logical, default FALSE; as per –pick, but adds the PICK flag to the chosen block of consequence data and retains others.

  • flag_pick_allele: logical, default FALSE; as per –pick_allele, but adds the PICK flag to the chosen block of consequence data and retains others.

  • per_gene: logical, default FALSE; output only the most severe consequence per gene

  • pick_order: character, See ensembl web page for default order; customise the order of criteria applied when choosing a block of annotation data with e.g. –pick.

  • most_severe: logical, default FALSE; output only most severe consequence per variation

  • summary: logical, default FALSE; output a comma-separated list of all observed consequences per variation, transcript-specific columns will be left blank

  • filter_common: logical, default FALSE; shortcut flag to turn on filters, See web page for details.

  • check_frequency: logical, default FALSE; turn on frequency filtering, must also specify all of the –freq_* flags. See web page for details.

  • freq_pop: character, default character(); population to use in frequency filter

  • freq_freq: numeric, default numeric(); MAF to use in frequency filter

  • freq_gt_lt: character, default character(); specify whether the frequency of the co-located variant must be greater than or less than the value specified. Values are 'gt' or 'lt'. in the freq_freq option.

  • freq_filter: character, default character(); specify whether to exclude or include variants that pass the frequency filter. Values are 'exclude' or 'include'.

  • allow_non_variant: logical, default FALSE; when using VCF format as input and output, by default VEP will skip all non-variant lines of input (i.e., where the ALT is NULL). When this option is enabled, lines will be printed in the VCF output with no consequence data added.

database

list of the following options:

  • database: logical, default TRUE; enable the VEP to use local or remote databases

  • host: character, default character(); database host. This will use the default as defined by vep 'ensembldb.ensembl.org'. Users in the US may find connection and transfer speeds quicker using the East coast mirror, 'useastdb.ensembl.org'.

  • user: character default character(); database user

  • password: character, default character(); database password

  • port: numeric, default character(); database port

  • genomes: logical, default FALSE; override default connection settings with those for the Ensembl Genomces public MySQL server

  • gencode_basic: logical, default FALSE; limit analysis to transcripts in GENCODE basic set

  • refseq: logical, default FALSE; use otherfeatures database to retrieve transcripts

  • merged: logical, default FALSE; use the merged Ensembl and RefSeq cache

  • all_refseq: logical, default FALSE; include e.g. CCDS and Ensembl EST transcripts

  • lrg: logical, default FALSE; map input variants to LRG coordinates

  • db_version: numeric, default character(); force connection to specific version

  • registry: character, default character(); provide file to override default connection settings

advanced

list of the following options:

  • no_whole_genome: logical, default FALSE; run in non-whole genome mode, variants analyzed one at a time, no caching

  • buffer_size: numeric, default 5000; internal buffer size corresponding to number of variations read into memory simultaneously

  • write_cache: logical, default FALSE; enable writing to the cache

  • build: character, default character(); build cache for the selected species from the database (See –chr flag)

  • compress: character, default character(); specify utility to decompress cached files (zcat is default)

  • skip_db_check: logical, default FALSE; force the script to use a cache built from a different host than specified with –host

  • cache_region_size: numeric, default numeric(); size in base-pairs of the region covered by one file in the cache, see full description of this flag on the web site for details

Author(s)

Valerie Obenchain

See Also

  • The ensemblVEP function man page.

  • The VEPParam class man page.

Examples

  ## See ?VEPParam for examples of constructing instances of a
  ## VEPParam object with different runtime options.

Bioconductor/ensemblVEP documentation built on May 4, 2024, 4:50 p.m.