removeGaps: Remove or replace gaps from protein sequences.

View source: R/misc-07-removeGaps.R

removeGapsR Documentation

Remove or replace gaps from protein sequences.

Description

Remove/replace gaps or any irregular characters from protein sequences, to make them suitable for feature extraction or sequence alignment based similarity computation.

Usage

removeGaps(x, pattern = "-", replacement = "", ...)

Arguments

x

character vector, containing the input protein sequence(s).

pattern

character string contains the gap (or other irregular) character to be removed or replaced. Default is "-". For advanced usage, see gsub.

replacement

a replacement for matched characters. Default is "" (remove the matched character).

...

addtional parameters for gsub.

Value

a vector of protein sequence(s) with gaps or irregular characters removed/replaced.

Author(s)

Nan Xiao <https://nanx.me>

Examples

# amino acid sequences that contain gaps ("-")
aaseq <- list(
  "MHGDTPTLHEYMLDLQPETTDLYCYEQLSDSSE-EEDEIDGPAGQAEPDRAHYNIVTFCCKCDSTLRLCVQS",
  "MHGDTPTLHEYMLDLQPETTDLYCYEQLNDSSE-EEDEIDGPAGQAEPDRAHYNIVTFCCKCDSTLRLCVQS"
)
## Not run: 
#' # gaps create issues for alignment
parSeqSim(aaseq)

# remove the gaps
nogapseq <- removeGaps(aaseq)
parSeqSim(nogapseq)

## End(Not run)

protr documentation built on Sept. 12, 2024, 6:44 a.m.