View source: R/summarize_sample.R
summarize_sample | R Documentation |
Converts an STR sample data frame as produced by analyze_sample into a concise list of consistent attributes, suitable for binding together across samples for a dataset. At this stage the summary is prepared for a single specific locus as in analyze_sample but as a list with a fixed length. The Allele1 entries correspond to the sequence with the highest count, Allele2 the second highest. See the Functions section below for how specific variants of this function behave.
summarize_sample(
sample_data,
sample_attrs,
min_locus_reads = cfg("min_locus_reads")
)
summarize_sample_guided(
sample_data,
sample_attrs,
min_locus_reads = cfg("min_locus_reads")
)
sample_data |
data frame of processed data for one sample as produced by analyze_sample. |
sample_attrs |
list of sample attributes, such as the rows produced by prepare_dataset. |
min_locus_reads |
numeric threshold for the minimum number of counts that must be present, in total across entries passing all filters, for potential alleles to be considered. |
Entries in the returned list:
For Allele1 and Allele2:
Seq
: sequence text for each allele.
Count
: integer count of occurrences of this exact sequence.
Length
: integer sequence length.
Homozygous
: If the sample appears homozygous (if so, the Allele2 entries
will be NA).
Ambiguous
: If a potential allele was ignored due to ambiguous bases in
sequence content (such as "N").
Stutter
: If a potential allele was ignored due to apparent PCR stutter.
Artifact
: If a potential allele was ignored due to apparent PCR artifact
(other than stutter).
CountTotal
: The total number of sequences in the original sample data.
CountLocus
: The number of sequences matching all criteria for the
specified locus in the original sample data.
ProminentSeqs
: The number of entries above the specified threshold after
all filtering. This should be either one (for a homozygous sample) or two
(for a heterozygous sample) but conditions such as cross-sample
contamination or excessive PCR stutter can lead to more than two.
list of attributes describing the sample.
summarize_sample()
: Default version of sample summary.
summarize_sample_guided()
: Summarize a processed STR sample Using known
lengths. If ExpectedLength1
and optionally ExpectedLength2
are given
in sample_attrs
, the min_locus_reads
threshold is ignored. See also
analyze_sample_guided.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.