findLoci: Find methylation loci in a genome

Description Usage Arguments Details Value Author(s) See Also Examples

View source: R/findLoci.R

Description

This is a convenience function to find methylation loci, such as CpGs, in a reference genome. The result is useful as the value of the loci argument for read.bismark().

Usage

1
2
3
4
5
6
findLoci(pattern,
         subject,
         include = seqlevels(subject),
         strand = c("*", "+", "-"),
         fixed = "subject",
         resize = TRUE)

Arguments

pattern

A string specifying the pattern to search for, e.g. "CG". Can contain IUPAC ambiguity codes, e.g., "CH".

subject

A string containing a file path to the genome sequence, in FASTA or 2bit format, to be searched. Alternatively, a BSgenome or DNAStringSet object storing the genome sequence to be searched.

include

A character vector indicating the seqlevels of subject to be used.

strand

A character scaler specifying the strand of subject to be used. If strand = "*", then both the positive (strand = "+") and negative (strand = "-" strands will be searched.) It is assumed that subject contains the sequence with respect to the + strand.

fixed

If "subject" (the default), IUPAC ambiguity codes in the pattern only are interpreted as wildcards, e.g., a pattern containing CH matches a subject containing CA but not vice versa. See ?Biostrings::`lowlevel-matching` for more information

resize

A logical scalar specifying whether the ranges should be shifted to have width 1 and anchored by the start of the locus, e.g., resize a CpG dinucleotide to give the co-ordinates of the cytosine.

Details

This function provides a convenience wrapper for finding methylation loci in a genome, based on running vmatchPattern(). Users requiring finer-grained control should directly use the vmatchPattern() function and coerce the result to a GRanges object.

Value

A GRanges object storing the found loci.

Author(s)

Peter F. Hickey

See Also

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
  library(Biostrings)
  # Find CpG dinucleotides on the both strands of an artificial sequence
  my_seq <- DNAStringSet("ATCAGTCGC")
  names(my_seq) <- "test"
  findLoci(pattern = "CG", subject = my_seq)
  # Find CHG trinucleotides on the both strands of an artificial sequence
  findLoci(pattern = "CHG", subject = my_seq)

  # Find CpG dinucleotides on the + strand of chr17 from the human genome (hg38)
  if (requireNamespace("BSgenome.Hsapiens.UCSC.hg38")) {
    findLoci(
        pattern = "CG",
        subject = BSgenome.Hsapiens.UCSC.hg38::BSgenome.Hsapiens.UCSC.hg38,
        include = "chr17",
        strand = "+")
  }

bsseq documentation built on Nov. 8, 2020, 7:53 p.m.