inferGenotypeBayesian | R Documentation |
inferGenotypeBayesian
infers an subject's genotype by applying a Bayesian framework
with a Dirichlet prior for the multinomial distribution. Up to four distinct alleles are
allowed in an individual’s genotype. Four likelihood distributions were generated by
empirically fitting three high coverage genotypes from three individuals
(Laserson and Vigneault et al, 2014). A posterior probability is calculated for the
four most common alleles. The certainty of the highest probability model was
calculated using a Bayes factor (the most likely model divided by second-most likely model).
The larger the Bayes factor (K), the greater the certainty in the model.
inferGenotypeBayesian(
data,
germline_db = NA,
novel = NA,
v_call = "v_call",
seq = "sequence_alignment",
find_unmutated = TRUE,
priors = c(0.6, 0.4, 0.4, 0.35, 0.25, 0.25, 0.25, 0.25, 0.25)
)
data |
a |
germline_db |
named vector of sequences containing the
germline sequences named in |
novel |
an optional |
v_call |
column in |
seq |
name of the column in |
find_unmutated |
if |
priors |
a numeric vector of priors for the multinomial distribution.
The |
Allele calls representing cases where multiple alleles have been
assigned to a single sample sequence are rare among unmutated
sequences but may result if nucleotides for certain positions are
not available. Calls containing multiple alleles are treated as
belonging to all groups. If novel
is provided, all
sequences that are assigned to the same starting allele as any
novel germline allele will have the novel germline allele appended
to their assignent prior to searching for unmutated sequences.
A data.frame
of alleles denoting the genotype of the subject with the log10
of the likelihood of each model and the log10 of the Bayes factor. The output
contains the following columns:
gene
: The gene name without allele.
alleles
: Comma separated list of alleles for the given gene
.
counts
: Comma separated list of observed sequences for each
corresponding allele in the alleles
list.
total
: The total count of observed sequences for the given gene
.
note
: Any comments on the inferrence.
kh
: log10 likelihood that the gene
is homozygous.
kd
: log10 likelihood that the gene
is heterozygous.
kt
: log10 likelihood that the gene
is trizygous
kq
: log10 likelihood that the gene
is quadrozygous.
k_diff
: log10 ratio of the highest to second-highest zygosity likelihoods.
This method works best with data derived from blood, where a large portion of sequences are expected to be unmutated. Ideally, there should be hundreds of allele calls per gene in the input.
Laserson U and Vigneault F, et al. High-resolution antibody dynamics of vaccine-induced immune responses. PNAS. 2014 111(13):4928-33.
plotGenotype for a colorful visualization and genotypeFasta to convert the genotype to nucleotide sequences. See inferGenotype to infer a subject-specific genotype using a frequency method
# Infer IGHV genotype, using only unmutated sequences, including novel alleles
inferGenotypeBayesian(AIRRDb, germline_db=SampleGermlineIGHV, novel=SampleNovel,
find_unmutated=TRUE, v_call="v_call", seq="sequence_alignment")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.