Predict amino acid coding changes for variants a coding regions
1 2 3 4 5 6 7 8 9 10
## S4 method for signature 'CollapsedVCF,TxDb,ANY,missing' predictCoding(query, subject, seqSource, varAllele, ..., ignore.strand=FALSE) ## S4 method for signature 'ExpandedVCF,TxDb,ANY,missing' predictCoding(query, subject, seqSource, varAllele, ..., ignore.strand=FALSE) ## S4 method for signature 'IntegerRanges,TxDb,ANY,DNAStringSet' predictCoding(query, subject, seqSource, varAllele, ..., ignore.strand=FALSE) ## S4 method for signature 'GRanges,TxDb,ANY,DNAStringSet' predictCoding(query, subject, seqSource, varAllele, ..., ignore.strand=FALSE) ## S4 method for signature 'VRanges,TxDb,ANY,missing' predictCoding(query, subject, seqSource, varAllele, ..., ignore.strand=FALSE)
A VCF, IntegerRanges, GRanges or
NOTE: Variants are expected to conform to the VCF specs as described on the 1000 Genomes page (see references). Indels must include the reference allele; zero-width ranges are not valid and return no matches.
A TxDb object that serves
as the annotation. GFF files can be converted to
TxDb objects with
A DNAStringSet containing the variant
(alternate) alleles. The length of
Additional arguments passed to methods. Arguments
This function returns the amino acid coding for variants that fall
completely ‘within’ a coding region. The reference sequences are
taken from a fasta file or BSgenome. The width of
the reference is determined from the start position and width of the
range in the
query. For guidance on how to represent an insertion,
deletion or substitution see the 1000 Genomes VCF format (references).
Variant alleles are taken from the
varAllele when supplied.
query is a
VCF object the
be missing. This value is taken internally from the
alt(<VCF>). The variant allele is substituted
into the reference sequences and transcribed. Transcription only
occurs if the substitution, insertion or deletion results in a new sequence
length divisible by 3.
query is an unstranded (*)
will attempt to find overlaps on both the positive and negative strands of the
subject. When the subject hit is on the negative strand the return
varAllele is reverse complemented. The strand of the returned
GRanges represents the strand of the subject hit.
A GRanges with a row for each variant-transcript
match. The result includes only variants that fell within coding regions.
The strand of the output
GRanges represents the strand of the
At a minimum, the metadata columns (accessible with
Variant allele. This value is reverse complemented for an unstranded
query that overlaps a
subject on the negative strand.
Map back to the row in the original query
Internal transcript id from the annotation
Internal coding region id from the annotation
Internal gene id from the annotation
Variant location in coding region-based coordinates. This position is
relative to the start of the coding (cds) region defined in the
Variant codon triplet location in coding region-based coordinates.
This position is relative to the start of the coding (cds) region
defined in the
Possible values are ‘synonymous’, ‘nonsynonymous’, ‘frameshift’,
‘nonsense’ and ‘not translated’. Variant sequences are translated only
when the codon sequence is a multiple of 3. The value will be ‘frameshift’
when a sequence is of incompatible length. ‘not translated’ is used
varAllele is missing or there is an ‘N’ in the
sequence. ‘nonsense’ is used for premature stop codons.
The reference codon sequence. This range is typically greater
than the width of the range in the
GRanges because it includes
all codons involved in the sequence modification. If the reference
sequence is of width 2 but the alternate allele is of width 4 then at
least two codons must be included in the
This sequence is the result of inserting, deleting or replacing the position(s) in the reference sequence alternate allele. If the result of this substitution is not a multiple of 3 it will not be translated.
The reference amino acid column contains the translated
When translation is not possible this value is missing.
The variant amino acid column contains the translated
translation is not possible this value is missing.
Michael Lawrence and Valerie Obenchain
readVcf, locateVariants, refLocsToLocalLocs getTranscriptSeqs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
library(BSgenome.Hsapiens.UCSC.hg19) library(TxDb.Hsapiens.UCSC.hg19.knownGene) txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene ## ---------------------------- ## VCF object as query ## ---------------------------- ## Read variants from a VCF file fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation") vcf <- readVcf(fl, "hg19") ## Rename seqlevels in the VCF object to match those in the TxDb. vcf <- renameSeqlevels(vcf, "chr22") ## Confirm common seqlevels intersect(seqlevels(vcf), seqlevels(txdb)) ## When 'query' is a VCF object the varAllele argument is missing. coding1 <- predictCoding(vcf, txdb, Hsapiens) head(coding1, 3) ## Exon-centric or cDNA locations: exonsbytx <- exonsBy(txdb, "tx") cDNA <- mapToTranscripts(coding1, exonsbytx) mcols(cDNA)$TXID <- names(exonsbytx)[mcols(cDNA)$transcriptsHits] cDNA <- cDNA[mcols(cDNA)$TXID == mcols(coding1)$TXID[mcols(cDNA)$xHits]] ## Make sure cDNA is parallel to coding1 stopifnot(identical(mcols(cDNA)$xHits, seq_along(coding1))) coding1$cDNA <- ranges(cDNA) ## ---------------------------- ## GRanges object as query ## ---------------------------- ## A GRanges can also be used as the 'query'. The seqlevels in the VCF ## were adjusted in previous example so the GRanges extracted with ## has the correct seqlevels. rd <- rowRanges(vcf) ## The GRanges must be expanded to have one row per alternate allele. ## Variants 1, 2 and 10 have two alternate alleles. altallele <- alt(vcf) eltROWS <- elementNROWS(altallele) rd_exp <- rep(rd, eltROWS) ## Call predictCoding() with the expanded GRanges and the unlisted ## alternate allele as the 'varAllele'. coding2 <- predictCoding(rd_exp, txdb, Hsapiens, unlist(altallele))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.