remove_ambiguous: Remove sequences that contain ambiguous elements

View source: R/remove_ambiguous.R

remove_ambiguousR Documentation

Remove sequences that contain ambiguous elements

Description

This function replaces sequences with ambiguous elements by empty (NULL) sequences or removes ambiguous elements from sequences in an sq object.

Usage

remove_ambiguous(x, by_letter = FALSE, ...)

## S3 method for class 'sq'
remove_ambiguous(
  x,
  by_letter = FALSE,
  ...,
  NA_letter = getOption("tidysq_NA_letter")
)

Arguments

x

[sq_dna_bsc || sq_rna_bsc || sq_dna_ext || sq_rna_ext || sq_ami_bsc || sq_ami_ext]
An object this function is applied to.

by_letter

[logical(1)]
If FALSE, filter condition is applied to sequence as a whole. If TRUE, each letter is applied filter to separately.

...

further arguments to be passed from or to other methods.

NA_letter

[character(1)]
A string that is used to interpret and display NA value in the context of sq class. Default value equals to "!".

Details

Biological sequences, whether of DNA, RNA or amino acid elements, are not always exactly determined. Sometimes the only information the user has about an element is that it's one of given set of possible elements. In this case the element is described with one of special letters, here called ambiguous.

The inclusion of these letters is the difference between extended and basic alphabets (and, conversely, types). For amino acid alphabet these letters are: B, J, O, U, X, Z; whereas for DNA and RNA: W, S, M, K, R, Y, B, D, H, V, N.

remove_ambiguous() is used to create sequences without any of the elements above. Depending on value of by_letter argument, the function either replaces "ambiguous" sequences with empty sequences (if by_letter is equal to TRUE) or shortens original sequence by retaining only unambiguous letters (if opposite is true).

Value

An sq object with the _bsc version of inputted type.

See Also

Functions that clean sequences: is_empty_sq(), remove_na()

Examples

# Creating objects to work on:
sq_ami <- sq(c("MIAANYTWIL","TIAALGNIIYRAIE", "NYERTGHLI", "MAYXXXIALN"),
             alphabet = "ami_ext")
sq_dna <- sq(c("ATGCAGGA", "GACCGAACGAN", "TGACGAGCTTA", "ACTNNAGCN"),
             alphabet = "dna_ext")

# Removing whole sequences with ambiguous elements:
remove_ambiguous(sq_ami)
remove_ambiguous(sq_dna)

# Removing ambiguous elements from sequences:
remove_ambiguous(sq_ami, by_letter = TRUE)
remove_ambiguous(sq_dna, by_letter = TRUE)

# Analysis of the result
sq_clean <- remove_ambiguous(sq_ami)
is_empty_sq(sq_clean)
sq_type(sq_clean)


michbur/tidysq documentation built on April 1, 2022, 5:18 p.m.