predictCoding | R Documentation |
Predict amino acid coding changes for variants a coding regions
## S4 method for signature 'CollapsedVCF,TxDb,ANY,missing'
predictCoding(query, subject, seqSource, varAllele, ..., ignore.strand=FALSE)
## S4 method for signature 'ExpandedVCF,TxDb,ANY,missing'
predictCoding(query, subject, seqSource, varAllele, ..., ignore.strand=FALSE)
## S4 method for signature 'IntegerRanges,TxDb,ANY,DNAStringSet'
predictCoding(query, subject, seqSource, varAllele, ..., ignore.strand=FALSE)
## S4 method for signature 'GRanges,TxDb,ANY,DNAStringSet'
predictCoding(query, subject, seqSource, varAllele, ..., ignore.strand=FALSE)
## S4 method for signature 'VRanges,TxDb,ANY,missing'
predictCoding(query, subject, seqSource, varAllele, ..., ignore.strand=FALSE)
query |
A VCF, IntegerRanges, GRanges or
When NOTE: Variants are expected to conform to the VCF specs as described on the 1000 Genomes page (see references). Indels must include the reference allele; zero-width ranges are not valid and return no matches. |
subject |
A TxDb object that serves
as the annotation. GFF files can be converted to
TxDb objects with
|
seqSource |
A |
varAllele |
A DNAStringSet containing the variant
(alternate) alleles. The length of When the |
... |
Additional arguments passed to methods. Arguments
|
ignore.strand |
A When
|
This function returns the amino acid coding for variants that fall
completely ‘within’ a coding region. The reference sequences are
taken from a fasta file or BSgenome. The width of
the reference is determined from the start position and width of the
range in the query
. For guidance on how to represent an insertion,
deletion or substitution see the 1000 Genomes VCF format (references).
Variant alleles are taken from the varAllele
when supplied.
When the query
is a VCF
object the varAllele
will
be missing. This value is taken internally from the VCF
with
alt(<VCF>)
. The variant allele is substituted
into the reference sequences and transcribed. Transcription only
occurs if the substitution, insertion or deletion results in a new sequence
length divisible by 3.
When the query
is an unstranded (*) GRanges
predictCoding
will attempt to find overlaps on both the positive and negative strands of the
subject
. When the subject hit is on the negative strand the return
varAllele
is reverse complemented. The strand of the returned
GRanges
represents the strand of the subject hit.
A GRanges with a row for each variant-transcript
match. The result includes only variants that fell within coding regions.
The strand of the output GRanges
represents the strand of the
subject
hit.
At a minimum, the metadata columns (accessible with mcols
) include,
varAllele
Variant allele. This value is reverse complemented for an unstranded
query
that overlaps a subject
on the negative strand.
QUERYID
Map back to the row in the original query
TXID
Internal transcript id from the annotation
CDSID
Internal coding region id from the annotation
GENEID
Internal gene id from the annotation
CDSLOC
Variant location in coding region-based coordinates. This position is
relative to the start of the coding (cds) region defined in the
subject
annotation.
PROTEINLOC
Variant codon triplet location in coding region-based coordinates.
This position is relative to the start of the coding (cds) region
defined in the subject
annotation.
CONSEQUENCE
Possible values are ‘synonymous’, ‘nonsynonymous’, ‘frameshift’,
‘nonsense’ and ‘not translated’. Variant sequences are translated only
when the codon sequence is a multiple of 3. The value will be ‘frameshift’
when a sequence is of incompatible length. ‘not translated’ is used
when varAllele
is missing or there is an ‘N’ in the
sequence. ‘nonsense’ is used for premature stop codons.
REFCODON
The reference codon sequence. This range is typically greater
than the width of the range in the GRanges
because it includes
all codons involved in the sequence modification. If the reference
sequence is of width 2 but the alternate allele is of width 4 then at
least two codons must be included in the REFSEQ
.
VARCODON
This sequence is the result of inserting, deleting or replacing the position(s) in the reference sequence alternate allele. If the result of this substitution is not a multiple of 3 it will not be translated.
REFAA
The reference amino acid column contains the translated REFSEQ
.
When translation is not possible this value is missing.
VARAA
The variant amino acid column contains the translated VARSEQ
. When
translation is not possible this value is missing.
Michael Lawrence and Valerie Obenchain
http://www.1000genomes.org/wiki/analysis/variant-call-format/ http://vcftools.sourceforge.net/specs.html
readVcf, locateVariants, refLocsToLocalLocs getTranscriptSeqs
library(BSgenome.Hsapiens.UCSC.hg19)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
## ----------------------------
## VCF object as query
## ----------------------------
## Read variants from a VCF file
fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")
vcf <- readVcf(fl, "hg19")
## Rename seqlevels in the VCF object to match those in the TxDb.
vcf <- renameSeqlevels(vcf, "chr22")
## Confirm common seqlevels
intersect(seqlevels(vcf), seqlevels(txdb))
## When 'query' is a VCF object the varAllele argument is missing.
coding1 <- predictCoding(vcf, txdb, Hsapiens)
head(coding1, 3)
## Exon-centric or cDNA locations:
exonsbytx <- exonsBy(txdb, "tx")
cDNA <- mapToTranscripts(coding1, exonsbytx)
mcols(cDNA)$TXID <- names(exonsbytx)[mcols(cDNA)$transcriptsHits]
cDNA <- cDNA[mcols(cDNA)$TXID == mcols(coding1)$TXID[mcols(cDNA)$xHits]]
## Make sure cDNA is parallel to coding1
stopifnot(identical(mcols(cDNA)$xHits, seq_along(coding1)))
coding1$cDNA <- ranges(cDNA)
## ----------------------------
## GRanges object as query
## ----------------------------
## A GRanges can also be used as the 'query'. The seqlevels in the VCF
## were adjusted in previous example so the GRanges extracted with
## has the correct seqlevels.
rd <- rowRanges(vcf)
## The GRanges must be expanded to have one row per alternate allele.
## Variants 1, 2 and 10 have two alternate alleles.
altallele <- alt(vcf)
eltROWS <- elementNROWS(altallele)
rd_exp <- rep(rd, eltROWS)
## Call predictCoding() with the expanded GRanges and the unlisted
## alternate allele as the 'varAllele'.
coding2 <- predictCoding(rd_exp, txdb, Hsapiens, unlist(altallele))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.