padAndClip: Pad and clip strings

View source: R/padAndClip.R

padAndClipR Documentation

Pad and clip strings

Description

padAndClip first conceptually pads the supplied strings with an infinite number of padding letters on both sides, then clip them.

stackStrings is a convenience wrapper to padAndClip that turns a variable-width set of strings into a rectangular (i.e. constant-width) set, by padding and clipping the strings, after conceptually shifting them horizontally.

Usage

padAndClip(x, views, Lpadding.letter=" ", Rpadding.letter=" ",
           remove.out.of.view.strings=FALSE)

stackStrings(x, from, to, shift=0L,
             Lpadding.letter=" ", Rpadding.letter=" ",
             remove.out.of.view.strings=FALSE)

Arguments

x

An XStringSet object containing the strings to pad and clip.

views

A IntegerRanges object (recycled to the length of x if necessary) defining the region to keep for each string. Because the strings are first conceptually padded with an infinite number of padding letters on both sides, regions can go beyond string limits.

Lpadding.letter, Rpadding.letter

A single letter to use for padding on the left, and another one to use for padding on the right. Note that the default letter (" ") does not work if, for example, x is a DNAStringSet object, because the space is not a valid DNA letter (see ?DNA_ALPHABET). So the Lpadding.letter and Rpadding.letter arguments must be supplied if x is not a BStringSet object. For example, if x is a DNAStringSet object, a typical choice is to use "+".

remove.out.of.view.strings

TRUE or FALSE. Whether or not to remove the strings that are out of view in the returned object.

from, to

Another way to specify the region to keep for each string, but with the restriction that from and to must be single integers. So only 1 region can be specified, and the same region is used for all the strings.

shift

An integer vector (recycled to the length of x if necessary) specifying the amount of shifting (in number of letters) to apply to each string before doing pad and clip. Positive values shift to the right and negative values to the left.

Value

For padAndClip: An XStringSet object. If remove.out.of.view.strings is FALSE, it has the same length and names as x, and its "shape", which is described by the integer vector returned by width(), is the same as the shape of the views argument after recycling.

The class of the returned object is the direct concrete subclass of XStringSet that x belongs to or derives from. There are 4 direct concrete subclasses of the XStringSet virtual class: BStringSet, DNAStringSet, RNAStringSet, and AAStringSet. If x is an instance of one of those classes, then the returned object has the same class as x (i.e. in that case, padAndClip acts as an endomorphism). But if x derives from one of those 4 classes, then the returned object is downgraded to the class x derives from. In that case, padAndClip does not act as an endomorphism.

For stackStrings: Same as padAndClip. In addition it is guaranteed to have a rectangular shape i.e. to be a constant-width XStringSet object.

Author(s)

H. Pagès

See Also

  • The stackStringsFromBam function in the GenomicAlignments package for stacking the read sequences (or their quality strings) stored in a BAM file on a region of interest.

  • The XStringViews class to formally represent a set of views on a single string.

  • The extractAt and replaceAt functions for extracting/replacing arbitrary substrings from/in a string or set of strings.

  • The XStringSet class.

  • The IntegerRanges class in the IRanges package.

Examples

x <- BStringSet(c(seq1="ABCD", seq2="abcdefghijk", seq3="", seq4="XYZ"))

padAndClip(x, IRanges(3, 8:5), Lpadding.letter=">", Rpadding.letter="<")
padAndClip(x, IRanges(1:-2, 7), Lpadding.letter=">", Rpadding.letter="<")

stackStrings(x, 2, 8)

stackStrings(x, -2, 8, shift=c(0, -11, 6, 7),
             Lpadding.letter="#", Rpadding.letter=".")

stackStrings(x, -2, 8, shift=c(0, -14, 6, 7),
             Lpadding.letter="#", Rpadding.letter=".")

stackStrings(x, -2, 8, shift=c(0, -14, 6, 7),
             Lpadding.letter="#", Rpadding.letter=".",
             remove.out.of.view.strings=TRUE)

library(hgu95av2probe)
probes <- DNAStringSet(hgu95av2probe)
probes

stackStrings(probes, 0, 26,
             Lpadding.letter="+", Rpadding.letter="-")

options(showHeadLines=15)
stackStrings(probes, 3, 23, shift=6*c(1:5, -(1:5)),
             Lpadding.letter="+", Rpadding.letter="N",
             remove.out.of.view.strings=TRUE)

Bioconductor/Biostrings documentation built on Dec. 16, 2024, 8:46 a.m.