substitute_letters: Substitute letters in a sequence

View source: R/substitute_letters.R

substitute_lettersR Documentation

Substitute letters in a sequence

Description

Replaces all occurrences of a letter with another.

Usage

substitute_letters(x, encoding, ...)

## S3 method for class 'sq'
substitute_letters(x, encoding, ..., NA_letter = getOption("tidysq_NA_letter"))

Arguments

x

[sq]
An object this function is applied to.

encoding

[character || numeric]
A dictionary (named vector), where names are letters to be replaced and elements are their respective replacements.

...

further arguments to be passed from or to other methods.

NA_letter

[character(1)]
A string that is used to interpret and display NA value in the context of sq class. Default value equals to "!".

Details

substitute_letters allows to replace unwanted letters in any sequence with user-defined or IUPAC symbols. Letters can also be replaced with NA values, so that they can be later removed from the sequence by remove_na function.

It doesn't matter whether replaced or replacing letter is single or multiple character. However, the user cannot replace multiple letters with one nor one letter with more than one.

Of course, multiple different letters can be encoded to the same symbol, so c(A = "rep1", H = "rep1", G = "rep1") is allowed, but c(AHG = "rep1") is not (unless there is a letter "AHG" in the alphabet). By doing that any information of separateness of original letters is lost, so it isn't possible to retrieve original sequence after this operation.

All encoding names must be letters contained within the alphabet, otherwise an error will be thrown.

Value

An sq object of atp type with updated alphabet.

See Also

Functions that manipulate type of sequences: find_invalid_letters(), is.sq(), sq_type(), typify()

Examples

# Creating objects to work on:
sq_dna <- sq(c("ATGCAGGA", "GACCGAACGAN", "TGACGAGCTTA", "ACTNNAGCN"),
             alphabet = "dna_ext")
sq_ami <- sq(c("MIOONYTWIL","TIOOLGNIIYROIE", "NYERTGHLI", "MOYXXXIOLN"),
             alphabet = "ami_ext")
sq_atp <- sq(c("mALPVQAmAmA", "mAmAPQ"), alphabet = c("mA", LETTERS))

# Not all letters must have their encoding specified:
substitute_letters(sq_dna, c(T = "t", A = "a", C = "c", G = "g"))
substitute_letters(sq_ami, c(M = "X"))

# Multiple character letters are supported in encodings:
substitute_letters(sq_atp, c(mA = "-"))
substitute_letters(sq_ami, c(I = "ough", O = "eau"))

# Numeric substitutions are allowed too, these are coerced to characters:
substitute_letters(sq_dna, c(N = 9, G = 7))

# It's possible to replace a letter with NA value:
substitute_letters(sq_ami, c(X = NA_character_))


michbur/tidysq documentation built on April 1, 2022, 5:18 p.m.