deduplicate_seqs: Deduplicates sequence data

Description Usage Arguments Details Value Examples

View source: R/deduplicate.R

Description

Deduplicates sequence data

Usage

1

Arguments

dat

The sequence data (SeqFastadna)

Details

The dataset is converted to a vector of character strings and the unique sequences are selected with the unique function. Looping over the unique sequences, a list is constructed in which each element corresponds to a unique sequence. Each element is also a list with the elements the_seq containing the actual sequences and dup_names, a vector of character strings listing the names of all sequences that matches the unique sequences stored in the_seq.

Value

A list in which each unique sequence has an entry consisting of:

  1. the_seq: The sequence as a character string.

  2. dup_names: A character vector of the headers of all the sequences that had that exact sequence.

Examples

1
2
deduplicate_seqs(ld_seqs)
deduplicate_seqs(c('aaa', 'aaa', 'aab'))

philliplab/hypermutR documentation built on Sept. 2, 2020, 2:51 p.m.