oldRuntimeOptions: VEPParam runtime options
In Bioconductor/ensemblVEP: R Interface to Ensembl Variant Effect Predictor

oldRuntimeOptions

R Documentation

VEPParam runtime options

Description

Runtime options for the most current API version of the Ensembl Variant Effect Predictor.

Details

VEPParam objects store the runtime options for querying the Ensembl Variant Effect Predictor (VEP). This page describes only the most current runtime options and is a condensed version of what is listed on the Ensembl web site:

http://uswest.ensembl.org/info/docs/tools/vep/script/vep_options.html

Runtime options for archived versions can be found on the corresponding archive page.

http://useast.ensembl.org/info/website/archives/index.html

Runtime options:

Data in the VEPParam are organized into the following categories, ‘basic’, ‘input’, ‘cache’, ‘output’, ‘identifier’, ‘colocatedVariants’, ‘dataformat’, ‘filterqc’, ‘database’ and ‘advanced’. Each category is a list of runtime options. logical options are turned on/off with TRUE/FALSE. character and numeric are ‘on’ when a character string is provided and ‘off’ when they contain an empty value (i.e., character() or numeric().

‘identifier’, ‘colocatedVariants’, ‘dataformat’ are supported for VEPParam73 and later.

basic

list of the following options:

verbose: logical, default FALSE; output status messages
quiet: logical, default FALSE; suppress status/warnings
no_progress: logical, default FALSE; don't show progress bars
config: character, default character(); name of config file
everything: logical, default FALSE; shortcut to switch on 12 options (sift, polyphen, ccds, hgvs, hgnc, numbers, domains, regulatory, cell_type, canonical, protein and gmaf).
fork: numeric, default numeric(); enable forking

input

list of the the following options:

species: character, default 'homo_sapiens'; species for the data
assembly: character, default character(); select assembly version if more than one available
format: character, default character(); one of the following input file formats, 'ensembl', 'vcf', 'pileup', 'hgvs', 'id' or 'vep'. By default the script auto-detects the input file format.
output_file: character, default writes to temp file; path and file name of output file
force_overwrite: logical, default FALSE; overwrite the output file if it currently exists
stats_file: character, default character(); summary stats file name
no_stats: logical, default FALSE; do not generate a stats file
stats_text: logical, default FALSE; generate a plain text stats file instead of html
html: logical, default FALSE; generate html version of the output file

cache

list of the following options:

cache: logical, default FALSE; enable use of cache
dir: character, default '$HOME/.vep/'; cache/plugin to be used
dir_cache: character, default '$HOME/.vep/'; cache to be used
dir_plugins: character, default '$HOME/.vep/'; plugin to be used
offline: logical, default FALSE; enable offline mode, no database connections will be made
fasta: character, default character(); FASTA filename or directory to files to use for reference sequences
cache_version: character, default character(); use a different cache version than the assumed default
show_cache_info: logical, default FALSE; show source version information for selected cache and quit

output

list of the following options:

variant_class: logical, default FALSE; output the sequence ontology variant class
sift: character, default character(); output prediction, score or both, valid strings are 'p', 's' or 'b'
polyphen: character, default character(); output prediction, score or both, valid strings are 'p', 's' or 'b'
humdiv: logical, default FALSE; retrieve the humDiv PolyPhen prediction instead of humVar
gene_phenotype: logical, default FALSE; indicates if overlapped gene is associated with a phenotype, disease or trait
regulatory: logical, default FALSE; identify overlaps with regulatory regions
cell_type: character, default character(); only report regulatory regions found in the given cell type(s)
custom: character, default character(); name of custom annotation file to add to output. Currently only a single annotation is supported.
plugin: character, default character(); name of plugin module. Currently only a single module is supported.
individual: character, default character(); consider only alternate alleles present in the genotypes of 'all' or a character vector of specified individuals
phased: logical, default FALSE; force VCF genotypes to be interpreted as phased
allele_number: logical, default FALSE; identify allele number from VCF input (1=first ALT, 2=second ALT, etc.)
total_length: character, default character(); cDNA, CDS and protein positions as position/length
numbers: logical, default FALSE; output affectd exon and intron numbering, format is Number/Total
domains: logical, default FALSE; output names of overlapping protein domains
no_escape: logical, default FALSE; don't URI escape HGVS string
keep_csq: logical, default FALSE; don't overwrite existing CSQ entry in VCF INFO field
vcf_info_field: character, default CSQ; change the name of the INFO key that VEP writes the consequences to in the VCF output.
terms: character, default 'so'; type of consequence terms to output, valid strings are 'ensembl' or 'so'

identifiers

list of the following options:

hgvs: logical, default FALSE; add hgvs ID's
shift_hgvs: [0/1], default 1 (shift); enable or disable 3' shifting of HGVS notations
protein: logical, default FALSE; add Ensembl protein ID's
symbol: logical, default FALSE; add gene symbol (e.g. HGNC) (where available) to the output
ccds: logical, default FALSE; add CCDS transcript ID's
uniprot: logical, default FALSE; adds identifiers for translated protein products from three UniProt-related databases
tsl: logical, default FALSE; adds the transcript support level for this transcript
canonical: logical, default FALSE; indicate if transcript is cononical transcript for the gene
biotype: logical, default FALSE; add biotype of transcript
xref_seq: logical, default FALSE; output aligned refseq mRNA ID

colocatedVariants

list of the following options:

check_existing: logical, default FALSE; check for co-located variants
check_alleles: logical, default FALSE; when checking for co-located variants only report them if none of the alleles supplied are novel
check_svs: logical, default FALSE; check for structural variants that overlap the input variants
gmaf: logical, default FALSE; add global minor allele frequence (MAF) from 1000 Genomes Phase 1 data
maf_1kg: logical, default FALSE; add MAF from continental populations of 1000 Genomes Phase 1 data; must be use with –cache
maf_esp: logical, default FALSE; add MAF from NHLBI-ESP populations; must be used with –cache
old_maf: logical, default FALSE; for maf_1kg and maf_esp report only the frequency (no allele) and convert so it is always a minor frequency, i.e. < 0.5
pubmed: logical, default FALSE; report Pubmed IDs for publications that cite existing variant; must be used with –cache
failed: logical, default FALSE; when checking for co-located variants include or exclude variants that have been flagged as failed

dataformat

list of the following options:

vcf: logical, default FALSE; write output in vcf format
json: logical, default FALSE; write output in json format
gvf: logical, default FALSE; write output in gcf format
fields: character, default fields are 'Uploaded_variation', 'Location', 'Allele', 'Gene', 'Feature', 'Feature_type', 'Consequence', 'cDNA_position', 'CDS_position', 'Protein_position', 'Amino_acids', 'Codons' and 'Extra'. See http://www.ensembl.org/info/docs/variation/vep/vep_formats.html#sv for details.
convert: character, default character(); converts input file to one of 'ensembl', 'vcf', or 'pileup'
minimal: logical, default FALSE; convert alleles to their most minimal representation before consequence calculation

filterqc

list of the following options:

check_ref: logical, default FALSE; force check of supplied reference allele against the sequence stored in Ensembl Core database
coding_only: logical, default FALSE; return consequences in coding regions only
chr: character, default character(); select a subset of chromosomes to be analyzed
no_intergenic: logical, default FALSE; do not include intergenic consequences
pick: logical, default FALSE; pick once line of consequence data per variant
pick_allele: logical, default FALSE; pick once line of consequence data per variant allele
flag_pick: logical, default FALSE; as per –pick, but adds the PICK flag to the chosen block of consequence data and retains others.
flag_pick_allele: logical, default FALSE; as per –pick_allele, but adds the PICK flag to the chosen block of consequence data and retains others.
per_gene: logical, default FALSE; output only the most severe consequence per gene
pick_order: character, See ensembl web page for default order; customise the order of criteria applied when choosing a block of annotation data with e.g. –pick.
most_severe: logical, default FALSE; output only most severe consequence per variation
summary: logical, default FALSE; output a comma-separated list of all observed consequences per variation, transcript-specific columns will be left blank
filter_common: logical, default FALSE; shortcut flag to turn on filters, See web page for details.
check_frequency: logical, default FALSE; turn on frequency filtering, must also specify all of the –freq_* flags. See web page for details.
freq_pop: character, default character(); population to use in frequency filter
freq_freq: numeric, default numeric(); MAF to use in frequency filter
freq_gt_lt: character, default character(); specify whether the frequency of the co-located variant must be greater than or less than the value specified. Values are 'gt' or 'lt'. in the freq_freq option.
freq_filter: character, default character(); specify whether to exclude or include variants that pass the frequency filter. Values are 'exclude' or 'include'.
allow_non_variant: logical, default FALSE; when using VCF format as input and output, by default VEP will skip all non-variant lines of input (i.e., where the ALT is NULL). When this option is enabled, lines will be printed in the VCF output with no consequence data added.

database

list of the following options:

database: logical, default TRUE; enable the VEP to use local or remote databases
host: character, default character(); database host. This will use the default as defined by vep 'ensembldb.ensembl.org'. Users in the US may find connection and transfer speeds quicker using the East coast mirror, 'useastdb.ensembl.org'.
user: character default character(); database user
password: character, default character(); database password
port: numeric, default character(); database port
genomes: logical, default FALSE; override default connection settings with those for the Ensembl Genomces public MySQL server
gencode_basic: logical, default FALSE; limit analysis to transcripts in GENCODE basic set
refseq: logical, default FALSE; use otherfeatures database to retrieve transcripts
merged: logical, default FALSE; use the merged Ensembl and RefSeq cache
all_refseq: logical, default FALSE; include e.g. CCDS and Ensembl EST transcripts
lrg: logical, default FALSE; map input variants to LRG coordinates
db_version: numeric, default character(); force connection to specific version
registry: character, default character(); provide file to override default connection settings

advanced

list of the following options:

no_whole_genome: logical, default FALSE; run in non-whole genome mode, variants analyzed one at a time, no caching
buffer_size: numeric, default 5000; internal buffer size corresponding to number of variations read into memory simultaneously
write_cache: logical, default FALSE; enable writing to the cache
build: character, default character(); build cache for the selected species from the database (See –chr flag)
compress: character, default character(); specify utility to decompress cached files (zcat is default)
skip_db_check: logical, default FALSE; force the script to use a cache built from a different host than specified with –host
cache_region_size: numeric, default numeric(); size in base-pairs of the region covered by one file in the cache, see full description of this flag on the web site for details

Author(s)

Valerie Obenchain

See Also

The ensemblVEP function man page.
The VEPParam class man page.

Examples

  ## See ?VEPParam for examples of constructing instances of a
  ## VEPParam object with different runtime options.

Bioconductor/ensemblVEP documentation built on May 4, 2024, 4:50 p.m.

Bioconductor/ensemblVEP index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com