replaceLetterAt: Replacing letters in a sequence (or set of sequences) at some...

replaceLetterAtR Documentation

Replacing letters in a sequence (or set of sequences) at some specified locations

Description

replaceLetterAt first makes a copy of a sequence (or set of sequences) and then replaces some of the original letters by new letters at the specified locations.

.inplaceReplaceLetterAt is the IN PLACE version of replaceLetterAt: it will modify the original sequence in place i.e. without copying it first. Note that in place modification of a sequence is fundamentally dangerous because it alters all objects defined in your session that make reference to the modified sequence. NEVER use .inplaceReplaceLetterAt, unless you know what you are doing!

Usage

replaceLetterAt(x, at, letter, if.not.extending="replace", verbose=FALSE)

## NEVER USE THIS FUNCTION!
.inplaceReplaceLetterAt(x, at, letter)

Arguments

x

A DNAString or rectangular DNAStringSet object.

at

The locations where the replacements must occur.

If x is a DNAString object, then at is typically an integer vector with no NAs but a logical vector or Rle object is valid too. Locations can be repeated and in this case the last replacement to occur at a given location prevails.

If x is a rectangular DNAStringSet object, then at must be a matrix of logicals with the same dimensions as x.

letter

The new letters.

If x is a DNAString object, then letter must be a DNAString object or a character vector (with no NAs) with a total number of letters (sum(nchar(letter))) equal to the number of locations specified in at.

If x is a rectangular DNAStringSet object, then letter must be a DNAStringSet object or a character vector of the same length as x. In addition, the number of letters in each element of letter must match the number of locations specified in the corresponding row of at (all(width(letter) == rowSums(at))).

if.not.extending

What to do if the new letter is not "extending" the old letter? The new letter "extends" the old letter if both are IUPAC letters and the new letter is as specific or less specific than the old one (e.g. M extends A, Y extends Y, but Y doesn't extend S). Possible values are "replace" (the default) for replacing in all cases, "skip" for not replacing when the new letter does not extend the old letter, "merge" for merging the new IUPAC letter with the old one, and "error" for raising an error.

Note that the gap ("-") and hard masking ("+") letters are not extending or extended by any other letter.

Also note that "merge" is the only value for the if.not.extending argument that guarantees the final result to be independent on the order the replacement is performed (although this is only relevant when at contains duplicated locations, otherwise the result is of course always independent on the order, whatever the value of if.not.extending is).

verbose

When TRUE, a warning will report the number of skipped or merged letters.

Details

.inplaceReplaceLetterAt semantic is equivalent to calling replaceLetterAt with if.not.extending="merge" and verbose=FALSE.

Never use .inplaceReplaceLetterAt! It is used by the injectSNPs function in the BSgenome package, as part of the "lazy sequence loading" mechanism, for altering the original sequences of a BSgenome object at "sequence-load time". This alteration consists in injecting the IUPAC ambiguity letters representing the SNPs into the just loaded sequence, which is the only time where in place modification of the external data of an XString object is safe.

Value

A DNAString or DNAStringSet object of the same shape (i.e. length and width) as the orignal object x for replaceLetterAt.

Author(s)

H. Pagès

See Also

  • The replaceAt function for extracting or replacing arbitrary subsequences from/in a sequence or set of sequences.

  • IUPAC_CODE_MAP for the mapping between IUPAC nucleotide ambiguity codes and their meaning.

  • The chartr and injectHardMask functions.

  • The DNAString and DNAStringSet class.

  • The injectSNPs function and the BSgenome class in the BSgenome package.

Examples

  ## Replace letters of a DNAString object:
  replaceLetterAt(DNAString("AAMAA"), c(5, 1, 3, 1), "TYNC")
  replaceLetterAt(DNAString("AAMAA"), c(5, 1, 3, 1), "TYNC", if.not.extending="merge")

  ## Replace letters of a DNAStringSet object (sorry for the totally
  ## artificial example with absolutely no biological meaning):
  library(drosophila2probe)
  probes <- DNAStringSet(drosophila2probe)
  at <- matrix(c(TRUE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE),
               nrow=length(probes), ncol=width(probes)[1],
               byrow=TRUE)
  letter_subject <- DNAString(paste(rep.int("-", width(probes)[1]), collapse=""))
  letter <- as(Views(letter_subject, start=1, end=rowSums(at)), "XStringSet")
  replaceLetterAt(probes, at, letter)

Bioconductor/Biostrings documentation built on Dec. 16, 2024, 8:46 a.m.