maskMotif: Masking by content (or by position)

maskMotifR Documentation

Masking by content (or by position)

Description

Functions for masking a sequence by content (or by position).

Usage

maskMotif(x, motif, min.block.width=1, ...)
mask(x, start=NA, end=NA, pattern)

Arguments

x

The sequence to mask.

motif

The motif to mask in the sequence.

min.block.width

The minimum width of the blocks to mask.

...

Additional arguments for matchPattern.

start

An integer vector containing the starting positions of the regions to mask.

end

An integer vector containing the ending positions of the regions to mask.

pattern

The motif to mask in the sequence.

Value

A MaskedXString object for maskMotif and an XStringViews object for mask.

Author(s)

H. Pagès

See Also

read.Mask, matchPattern, XString-class, MaskedXString-class, XStringViews-class, MaskCollection-class

Examples

  ## ---------------------------------------------------------------------
  ## EXAMPLE 1
  ## ---------------------------------------------------------------------

  maskMotif(BString("AbcbbcbEEE"), "bcb")
  maskMotif(BString("AbcbcbEEE"), "bcb")

  ## maskMotif() can be used in an incremental way to mask more than 1
  ## motif. Note that maskMotif() does not try to mask again what's
  ## already masked (i.e. the new mask will never overlaps with the
  ## previous masks) so the order in which the motifs are masked actually
  ## matters as it will affect the total set of masked positions.
  x0 <- BString("AbcbEEEEEbcbbEEEcbbcbc")
  x1 <- maskMotif(x0, "E")
  x1
  x2 <- maskMotif(x1, "bcb")
  x2
  x3 <- maskMotif(x2, "b")
  x3
  ## Note that inverting the order in which "b" and "bcb" are masked would
  ## lead to a different final set of masked positions.
  ## Also note that the order doesn't matter if the motifs to mask don't
  ## overlap (we assume that the motifs are unique) i.e. if the prefix of
  ## each motif is not the suffix of any other motif. This is of course
  ## the case when all the motifs have only 1 letter.

  ## ---------------------------------------------------------------------
  ## EXAMPLE 2
  ## ---------------------------------------------------------------------

  x <- DNAString("ACACAACTAGATAGNACTNNGAGAGACGC")

  ## Mask the N-blocks
  x1 <- maskMotif(x, "N")
  x1
  as(x1, "Views")
  gaps(x1)
  as(gaps(x1), "Views")

  ## Mask the AC-blocks 
  x2 <- maskMotif(x1, "AC")
  x2
  gaps(x2)

  ## Mask the GA-blocks
  x3 <- maskMotif(x2, "GA", min.block.width=5)
  x3  # masks 2 and 3 overlap
  gaps(x3)

  ## ---------------------------------------------------------------------
  ## EXAMPLE 3
  ## ---------------------------------------------------------------------

  library(BSgenome.Dmelanogaster.UCSC.dm3)
  chrU <- Dmelanogaster$chrU
  chrU
  alphabetFrequency(chrU)
  chrU <- maskMotif(chrU, "N")
  chrU
  alphabetFrequency(chrU)
  as(chrU, "Views")
  as(gaps(chrU), "Views")

  mask2 <- Mask(mask.width=length(chrU),
                start=c(50000, 350000, 543900), width=25000)
  names(mask2) <- "some ugly regions"
  masks(chrU) <- append(masks(chrU), mask2)
  chrU
  as(chrU, "Views")
  as(gaps(chrU), "Views")

  ## ---------------------------------------------------------------------
  ## EXAMPLE 4
  ## ---------------------------------------------------------------------
  ## Note that unlike maskMotif(), mask() returns an XStringViews object!

  ## masking "by position"
  mask("AxyxyxBC", 2, 6)

  ## masking "by content"
  mask("AxyxyxBC", "xyx")
  noN_chrU <- mask(chrU, "N")
  noN_chrU
  alphabetFrequency(noN_chrU, collapse=TRUE)

Bioconductor/Biostrings documentation built on Dec. 16, 2024, 8:46 a.m.