summarize_sample: Summarize a processed STR sample

View source: R/summarize_sample.R

summarize_sampleR Documentation

Summarize a processed STR sample

Description

Converts an STR sample data frame as produced by analyze_sample into a concise list of consistent attributes, suitable for binding together across samples for a dataset. At this stage the summary is prepared for a single specific locus as in analyze_sample but as a list with a fixed length. The Allele1 entries correspond to the sequence with the highest count, Allele2 the second highest. See the Functions section below for how specific variants of this function behave.

Usage

summarize_sample(
  sample_data,
  sample_attrs,
  min_locus_reads = cfg("min_locus_reads")
)

summarize_sample_guided(
  sample_data,
  sample_attrs,
  min_locus_reads = cfg("min_locus_reads")
)

Arguments

sample_data

data frame of processed data for one sample as produced by analyze_sample.

sample_attrs

list of sample attributes, such as the rows produced by prepare_dataset.

min_locus_reads

numeric threshold for the minimum number of counts that must be present, in total across entries passing all filters, for potential alleles to be considered.

Details

Entries in the returned list:

  • For Allele1 and Allele2:

    • Seq: sequence text for each allele.

    • Count: integer count of occurrences of this exact sequence.

    • Length: integer sequence length.

  • Homozygous: If the sample appears homozygous (if so, the Allele2 entries will be NA).

  • Ambiguous: If a potential allele was ignored due to ambiguous bases in sequence content (such as "N").

  • Stutter: If a potential allele was ignored due to apparent PCR stutter.

  • Artifact: If a potential allele was ignored due to apparent PCR artifact (other than stutter).

  • CountTotal: The total number of sequences in the original sample data.

  • CountLocus: The number of sequences matching all criteria for the specified locus in the original sample data.

  • ProminentSeqs: The number of entries above the specified threshold after all filtering. This should be either one (for a homozygous sample) or two (for a heterozygous sample) but conditions such as cross-sample contamination or excessive PCR stutter can lead to more than two.

Value

list of attributes describing the sample.

Functions

  • summarize_sample(): Default version of sample summary.

  • summarize_sample_guided(): Summarize a processed STR sample Using known lengths. If ExpectedLength1 and optionally ExpectedLength2 are given in sample_attrs, the min_locus_reads threshold is ignored. See also analyze_sample_guided.


ShawHahnLab/chiimp documentation built on Aug. 20, 2023, 1:41 a.m.