generateEvidence | R Documentation |
generateEvidence
builds a table of evidence metrics for the final novel V
allele detection and genotyping inferrences.
generateEvidence(
data,
novel,
genotype,
genotype_db,
germline_db,
j_call = "j_call",
junction = "junction",
fields = NULL
)
data |
a |
novel |
the |
genotype |
the |
genotype_db |
a vector of named nucleotide germline sequences in the genotype. Returned by genotypeFasta. |
germline_db |
the original uncorrected germline database used to by findNovelAlleles to identify novel alleles. |
j_call |
name of the column in |
junction |
Junction region nucleotide sequence, which includes
the CDR3 and the two flanking conserved codons. Default
is |
fields |
character vector of column names used to split the data to
identify novel alleles, if any. If |
Returns the genotype
input data.frame
with the following additional columns
providing supporting evidence for each inferred allele:
field_id
: Data subset identifier, defined with the input paramter fields
.
A variable number of columns, specified with the input parameter fields
.
polymorphism_call
: The novel allele call.
novel_imgt
: The novel allele sequence.
closest_reference
: The closest reference gene and allele in
the germline_db
database.
closest_reference_imgt
: Sequence of the closest reference gene and
allele in the germline_db
database.
germline_call
: The input (uncorrected) V call.
germline_imgt
: Germline sequence for germline_call
.
nt_diff
: Number of nucleotides that differ between the new allele and
the closest reference (closest_reference
) in the germline_db
database.
nt_substitutions
: A comma separated list of specific nucleotide
differences (e.g. 112G>A
) in the novel allele.
aa_diff
: Number of amino acids that differ between the new allele and the closest
reference (closest_reference
) in the germline_db
database.
aa_substitutions
: A comma separated list with specific amino acid
differences (e.g. 96A>N
) in the novel allele.
sequences
: Number of sequences unambiguosly assigned to this allele.
unmutated_sequences
: Number of records with the unmutated novel allele sequence.
unmutated_frequency
: Proportion of records with the unmutated novel allele
sequence (unmutated_sequences / sequences
).
allelic_percentage
: Percentage at which the (unmutated) allele is observed
in the sequence dataset compared to other (unmutated) alleles.
unique_js
: Number of unique J sequences found associated with the
novel allele. The sequences are those who have been unambiguously assigned
to the novel allelle (polymorphism_call
).
unique_cdr3s
: Number of unique CDR3s associated with the inferred allele.
The sequences are those who have been unambiguously assigned to the
novel allelle (polymorphism_call).
mut_min
: Minimum mutation considered by the algorithm.
mut_max
: Maximum mutation considered by the algorithm.
pos_min
: First position of the sequence considered by the algorithm (IMGT numbering).
pos_max
: Last position of the sequence considered by the algorithm (IMGT numbering).
y_intercept
: The y-intercept above which positions were considered
potentially polymorphic.
alpha
: Significance threshold to be used when constructing the
confidence interval for the y-intercept.
min_seqs
: Input min_seqs
. The minimum number of total sequences
(within the desired mutational range and nucleotide range) required
for the samples to be considered.
j_max
: Input j_max
. The maximum fraction of sequences perfectly
aligning to a potential novel allele that are allowed to utilize to a particular
combination of junction length and J gene.
min_frac
: Input min_frac
. The minimum fraction of sequences that must
have usable nucleotides in a given position for that position to be considered.
note
: Comments regarding the novel allele inferrence.
See findNovelAlleles, inferGenotype and genotypeFasta for generating the required input.
# Generate input data
novel <- findNovelAlleles(AIRRDb, SampleGermlineIGHV,
v_call="v_call", j_call="j_call", junction="junction",
junction_length="junction_length", seq="sequence_alignment")
genotype <- inferGenotype(AIRRDb, find_unmutated=TRUE,
germline_db=SampleGermlineIGHV,
novel=novel,
v_call="v_call", seq="sequence_alignment")
genotype_db <- genotypeFasta(genotype, SampleGermlineIGHV, novel)
data_db <- reassignAlleles(AIRRDb, genotype_db,
v_call="v_call", seq="sequence_alignment")
# Assemble evidence table
evidence <- generateEvidence(data_db, novel, genotype,
genotype_db, SampleGermlineIGHV,
j_call = "j_call",
junction = "junction")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.