analyze_sample: Analyze sequence table and categorize sequences
In ShawHahnLab/microsat: Computational, High-throughput Individual Identification through Microsatellite Profiling

analyze_sample

R Documentation

Analyze sequence table and categorize sequences

Description

Converts a full STR sequence data frame into a per-locus version and adds a Category factor column to designate which sequences look like alleles, artifacts, etc. At this stage the summary is prepared for a single specific locus, in contrast to analyze_seqs. See the Details section below for a description of the factor levels in the new Category column, and see the Functions section below for how specific variants of this function behave.

Usage

analyze_sample(
  seq_data,
  sample_attrs,
  min_allele_abundance = cfg("min_allele_abundance")
)

analyze_sample_guided(
  seq_data,
  sample_attrs,
  min_allele_abundance = cfg("min_allele_abundance")
)

analyze_sample_naive(
  seq_data,
  sample_attrs,
  min_allele_abundance = cfg("min_allele_abundance")
)

Arguments

`seq_data`	data frame of processed data for sample as produced by analyze_seqs.
`sample_attrs`	list of sample attributes, such as the rows produced by prepare_dataset. Used to select the locus name to filter on.
`min_allele_abundance`	numeric threshold for the minimum proportion of counts a given entry must have, compared to the total matching all criteria for that locus, to be considered as a potential allele.

Details

Factor levels in the added Category column, in order:

Allele: An identified allele sequence. There will be between zero and two of these.
Prominent: Any additional sequences beyond two called alleles that match all requirements (sequences that match all locus attributes, do not appear artifactual, and are above a given fraction of filtered reads).
Insignificant: Sequences with counts below the min_allele_abundance threshold.
Ambiguous: Sequences passing the min_allele_abundance threshold but with non-ACTG characters such as N, as defined by the Ambiguous column of seq_data.
Stutter: Sequences passing the min_allele_abundance threshold but matching stutter sequence criteria as defined by the Stutter column of seq_data.
Artifact: Sequences passing the min_allele_abundance threshold but matching non-stutter artifact sequence criteria as defined by the Artifact column of seq_data.

Value

filtered version of seq_data with added Category column.

Functions

analyze_sample(): default version of sample analysis. From here use summarize_sample.
analyze_sample_guided(): version of sample analysis guided by expected sequence length values. Additional items ExpectedLength1 and optionally ExpectedLength2 can be supplied in the sample_attrs list. If NA or missing the behavior will match analyze_sample. If two expected lengths are given, the min_allele_abundance argument is ignored. If at least one expected length is given, the stutter/artifact filtering is disabled. From here use summarize_sample_guided.
analyze_sample_naive(): version of sample analysis without stutter/artifact filtering. From here use summarize_sample as for analyze_sample.

ShawHahnLab/microsat documentation built on Aug. 25, 2023, 11:16 p.m.

ShawHahnLab/microsat index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ShawHahnLab/microsat
Computational, High-throughput Individual Identification through Microsatellite Profiling

analyze_sample: Analyze sequence table and categorize sequences
In ShawHahnLab/microsat: Computational, High-throughput Individual Identification through Microsatellite Profiling

Analyze sequence table and categorize sequences

Description

Usage

Arguments

Details

Value

Functions

Related to analyze_sample in ShawHahnLab/microsat...

R Package Documentation

Browse R Packages

We want your feedback!

ShawHahnLab/microsat Computational, High-throughput Individual Identification through Microsatellite Profiling

analyze_sample: Analyze sequence table and categorize sequences In ShawHahnLab/microsat: Computational, High-throughput Individual Identification through Microsatellite Profiling

Analyze sequence table and categorize sequences

Description

Usage

Arguments

Details

Value

Functions

Related to analyze_sample in ShawHahnLab/microsat...

R Package Documentation

Browse R Packages

We want your feedback!

ShawHahnLab/microsat
Computational, High-throughput Individual Identification through Microsatellite Profiling

analyze_sample: Analyze sequence table and categorize sequences
In ShawHahnLab/microsat: Computational, High-throughput Individual Identification through Microsatellite Profiling