View source: R/analyze_sample.R
analyze_sample | R Documentation |
Converts a full STR sequence data frame into a per-locus version and adds a Category factor column to designate which sequences look like alleles, artifacts, etc. At this stage the summary is prepared for a single specific locus, in contrast to analyze_seqs. See the Details section below for a description of the factor levels in the new Category column, and see the Functions section below for how specific variants of this function behave.
analyze_sample(
seq_data,
sample_attrs,
min_allele_abundance = cfg("min_allele_abundance")
)
analyze_sample_guided(
seq_data,
sample_attrs,
min_allele_abundance = cfg("min_allele_abundance")
)
analyze_sample_naive(
seq_data,
sample_attrs,
min_allele_abundance = cfg("min_allele_abundance")
)
seq_data |
data frame of processed data for sample as produced by analyze_seqs. |
sample_attrs |
list of sample attributes, such as the rows produced by prepare_dataset. Used to select the locus name to filter on. |
min_allele_abundance |
numeric threshold for the minimum proportion of counts a given entry must have, compared to the total matching all criteria for that locus, to be considered as a potential allele. |
Factor levels in the added Category column, in order:
Allele: An identified allele sequence. There will be between zero and two of these.
Prominent: Any additional sequences beyond two called alleles that match all requirements (sequences that match all locus attributes, do not appear artifactual, and are above a given fraction of filtered reads).
Insignificant: Sequences with counts below the min_allele_abundance
threshold.
Ambiguous: Sequences passing the min_allele_abundance
threshold but with
non-ACTG characters such as N, as defined by the Ambiguous column of
seq_data
.
Stutter: Sequences passing the min_allele_abundance
threshold but
matching stutter sequence criteria as defined by the Stutter column of
seq_data
.
Artifact: Sequences passing the min_allele_abundance
threshold but
matching non-stutter artifact sequence criteria as defined by the Artifact
column of seq_data
.
filtered version of seq_data
with added Category column.
analyze_sample()
: default version of sample analysis. From here use
summarize_sample.
analyze_sample_guided()
: version of sample analysis guided by expected
sequence length values. Additional items ExpectedLength1
and
optionally ExpectedLength2
can be supplied in the
sample_attrs
list. If NA or missing the behavior will match
analyze_sample. If two expected lengths are given, the
min_allele_abundance argument is ignored. If at least one expected length
is given, the stutter/artifact filtering is disabled. From here use
summarize_sample_guided.
analyze_sample_naive()
: version of sample analysis without
stutter/artifact filtering. From here use summarize_sample as for
analyze_sample
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.