getRegions: Generate Regions from CpGs

View source: R/Get_and_Filter_Regions.R

getRegionsR Documentation

Generate Regions from CpGs

Description

getRegions() generates a set of regions and some statistics based on the CpGs in a BSseq object and then saves it as a tab-delimited text file. Regions can be defined based on CpG locations (for CpG clusters), built-in genomic annotations from annotatr, or a custom genomic annotation.

Usage

getRegions(
  bs,
  annotation = NULL,
  genome = c("hg38", "hg19", "mm10", "mm9", "rn6", "rn5", "rn4", "dm6", "dm3",
    "galGal5"),
  upstream = 5000,
  downstream = 1000,
  custom = NULL,
  maxGap = 150,
  n = 3,
  save = TRUE,
  file = "Unfiltered_Regions.txt",
  verbose = TRUE
)

Arguments

bs

A BSseq object.

annotation

A character(1) giving the built-in genomic annotation to use for defining regions. Shortcuts are available for genes, promoters, and transcripts. Get the entire list of possible annotations with annotatr::builtin_annotations(), which also includes CpG islands, enhancers, and chromatin states.

genome

A character(1) with the genome build to use for built-in annotations. Available builds include hg38, hg19, mm10, mm9, rn6, rn5, rn4, dm6, dm3, and galGal5.

upstream

A numeric(1) giving the number of bases upstream of a transcription start site to specify a promoter. Used for the promoters built-in annotation.

downstream

A numeric(1) giving the number of bases downstream of a transcription start site to specify a promoter. Used for the promoters built-in annotation.

custom

A GRanges object with a custom genomic annotation for defining regions. Construct this using GenomicRanges::GRanges().

maxGap

A numeric(1) specifying the maximum number of bases between CpGs to be included in the same CpG cluster.

n

A numeric(1) giving the minimum number of CpGs for a region to be returned. This applies to CpG clusters, built-in, and custom, annotations.

save

A logical(1) indicating whether to save the data.frame.

file

A character(1) giving the file name (.txt) for the saved data.frame.

verbose

A logical(1) indicating whether messages should be printed.

Details

These regions still need to be filtered for minimum coverage and methylation standard deviation.

Value

A data.frame with the region genomic locations along with some statistics, including number of CpGs, coverage minimum, mean, and standard deviation, and methylation mean and standard deviation.

See Also

  • plotRegionStats(), plotSDstats(), getRegionTotals(), and plotRegionTotals() for help visualizing region characteristics and setting cutoffs for filtering.

  • filterRegions() for filtering regions by minimum coverage and methylation standard deviation.

Examples

## Not run: 

# Call Regions
regions <- getRegions(bs, file = "Unfiltered_Regions.txt")
plotRegionStats(regions, maxQuantile = 0.99,
                file = "Unfiltered_Region_Plots.pdf")
plotSDstats(regions, maxQuantile = 0.99,
            file = "Unfiltered_SD_Plots.pdf")

# Examine Region Totals at Different Cutoffs
regionTotals <- getRegionTotals(regions, file = "Region_Totals.txt")
plotRegionTotals(regionTotals, file = "Region_Totals.pdf")

# Filter Regions
regions <- filterRegions(regions, covMin = 10, methSD = 0.05,
                         file = "Filtered_Regions.txt")
plotRegionStats(regions, maxQuantile = 0.99,
                file = "Filtered_Region_Plots.pdf")
plotSDstats(regions, maxQuantile = 0.99,
            file = "Filtered_SD_Plots.pdf")

## End(Not run)


cemordaunt/comethyl documentation built on Oct. 20, 2023, 5:47 p.m.