analyze_seqs: Analyze a set of STR sequences
In ShawHahnLab/chiimp: Computational, High-throughput Individual Identification through Microsatellite Profiling

analyze_seqs

R Documentation

Analyze a set of STR sequences

Description

Dereplicates the given sequences and annotates any STR sequences found, returning the processed data as a data frame with one row per unique sequence, sorted by count. At this stage no information is filtered out, and all loci are treated equally.

Usage

analyze_seqs(
  seqs,
  locus_attrs,
  nrepeats = cfg("min_motif_repeats"),
  max_stutter_ratio = cfg("max_stutter_ratio"),
  artifact.count.ratio_max = cfg("max_artifact_ratio"),
  ...
)

Arguments

`seqs`	character vector containing sequences.
`locus_attrs`	data frame of attributes for loci to look for.
`nrepeats`	number of repeats of each locus' motif to require for a match.
`max_stutter_ratio`	highest ratio of read counts for second most frequent sequence to the most frequent where the second will be considered stutter.
`artifact.count.ratio_max`	as for `max_stutter_ratio` but for non-stutter artifact sequences.
`...`	additional arguments for make_read_primer_table

Details

Columns in the returned data frame:

Seq: sequence text for each unique sequence
Count: integer count of occurrences of this exact sequence
Length: integer sequence length
MatchingLocus: factor for the name of the locus matching each sequence, by checking the primer
MotifMatch: logical: are there are least nrepeats perfect adjacent repeats of the STR motif for the matching locus?
LengthMatch: logical: is the sequence length within the expected range for the matching locus?
Ambiguous: logical: are there unexpected characters in the sequence content?
Stutter: integer: for any sequence that looks like potential PCR stutter, the index of the row that may be the source of the stutter band.
Artifact: integer: for any sequence that looks like potential PCR artifact (other than stutter), the index of the row that may be the source of the stutter band.
FractionOfTotal: numeric fraction of the number of sequences represented by each unique sequence compared to the total.
FractionOfLocus: numeric fraction of the number of sequences represented by each unique sequence compared to the total for that particular matching locus.

Value

data frame of dereplicated sequences with added annotations.

Examples

# Starting from non-locus-specific sequences,
# a locus attributes table, and requiring
# three side-by-side motif repeats to register
# as a motif match for a locus,
raw_seq_vector <- c(test_data$seqs1$A, test_data$seqs1$B)
locus_attrs <- test_data$locus_attrs
num_adjacent_repeats <- 3
# Convert the character vector of sequences
# into a data frame with one row per
# unique sequence.
seq_data <- analyze_seqs(raw_seq_vector,
                         locus_attrs,
                         num_adjacent_repeats)

ShawHahnLab/chiimp documentation built on Aug. 20, 2023, 1:41 a.m.

ShawHahnLab/chiimp index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ShawHahnLab/chiimp
Computational, High-throughput Individual Identification through Microsatellite Profiling

analyze_seqs: Analyze a set of STR sequences
In ShawHahnLab/chiimp: Computational, High-throughput Individual Identification through Microsatellite Profiling

Analyze a set of STR sequences

Description

Usage

Arguments

Details

Value

Examples

Related to analyze_seqs in ShawHahnLab/chiimp...

R Package Documentation

Browse R Packages

We want your feedback!

ShawHahnLab/chiimp Computational, High-throughput Individual Identification through Microsatellite Profiling

analyze_seqs: Analyze a set of STR sequences In ShawHahnLab/chiimp: Computational, High-throughput Individual Identification through Microsatellite Profiling

Analyze a set of STR sequences

Description

Usage

Arguments

Details

Value

Examples

Related to analyze_seqs in ShawHahnLab/chiimp...

R Package Documentation

Browse R Packages

We want your feedback!

ShawHahnLab/chiimp
Computational, High-throughput Individual Identification through Microsatellite Profiling

analyze_seqs: Analyze a set of STR sequences
In ShawHahnLab/chiimp: Computational, High-throughput Individual Identification through Microsatellite Profiling