find_artifact: Check for non-stutter PCR artifacts between sequences

View source: R/analyze_seqs.R

find_artifactR Documentation

Check for non-stutter PCR artifacts between sequences

Description

Searches a processed STR sample for entries that may be PCR artifacts, other than stutter, from another entry in the sample. Potential artifacts are sequences with counts lower than another sequence by a given ratio and sequence length within 1 nucleotide of the other sequence. This only considers STR-labeled rows and requires a given entry to have counts at most count.ratio_max compared to the candidate "source" entry to be considered an artifact. Sequence content is not currently considered, just relative sequence lengths and counts.

Usage

find_artifact(
  sample.data,
  locus_attrs,
  count.ratio_max = cfg("max_artifact_ratio")
)

Arguments

sample.data

data frame of processed sample data.

locus_attrs

data frame of attributes for loci to look for.

count.ratio_max

comparing the currently-checked entry to another entry, this is the highest ratio of counts where an entry will still be considered artifactual

Value

integer vector specifying, for each entry, the row index for another entry that may have produced each entry as an artifactual sequence.


ShawHahnLab/chiimp documentation built on Aug. 20, 2023, 1:41 a.m.