analyze_seqs | R Documentation |
Dereplicates the given sequences and annotates any STR sequences found, returning the processed data as a data frame with one row per unique sequence, sorted by count. At this stage no information is filtered out, and all loci are treated equally.
analyze_seqs(
seqs,
locus_attrs,
nrepeats = cfg("min_motif_repeats"),
max_stutter_ratio = cfg("max_stutter_ratio"),
artifact.count.ratio_max = cfg("max_artifact_ratio"),
...
)
seqs |
character vector containing sequences. |
locus_attrs |
data frame of attributes for loci to look for. |
nrepeats |
number of repeats of each locus' motif to require for a match. |
max_stutter_ratio |
highest ratio of read counts for second most frequent sequence to the most frequent where the second will be considered stutter. |
artifact.count.ratio_max |
as for |
... |
additional arguments for make_read_primer_table |
Columns in the returned data frame:
Seq
: sequence text for each unique sequence
Count
: integer count of occurrences of this exact sequence
Length
: integer sequence length
MatchingLocus
: factor for the name of the locus matching each
sequence, by checking the primer
MotifMatch
: logical: are there are least nrepeats
perfect
adjacent repeats of the STR motif for the matching locus?
LengthMatch
: logical: is the sequence length within the expected
range for the matching locus?
Ambiguous
: logical: are there unexpected characters in the sequence
content?
Stutter
: integer: for any sequence that looks like potential PCR
stutter, the index of the row that may be the source of the stutter band.
Artifact
: integer: for any sequence that looks like potential PCR
artifact (other than stutter), the index of the row that may be the source
of the stutter band.
FractionOfTotal
: numeric fraction of the number of sequences
represented by each unique sequence compared to the total.
FractionOfLocus
: numeric fraction of the number of sequences
represented by each unique sequence compared to the total for that
particular matching locus.
data frame of dereplicated sequences with added annotations.
# Starting from non-locus-specific sequences,
# a locus attributes table, and requiring
# three side-by-side motif repeats to register
# as a motif match for a locus,
raw_seq_vector <- c(test_data$seqs1$A, test_data$seqs1$B)
locus_attrs <- test_data$locus_attrs
num_adjacent_repeats <- 3
# Convert the character vector of sequences
# into a data frame with one row per
# unique sequence.
seq_data <- analyze_seqs(raw_seq_vector,
locus_attrs,
num_adjacent_repeats)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.