create_triplet_distance_based: Map DNAm to target genes using distance approaches, and TF to...

View source: R/create_triplet_distance_based.R

create_triplet_distance_basedR Documentation

Map DNAm to target genes using distance approaches, and TF to the DNAm region using JASPAR2020 TFBS.

Description

This function wraps two other functions get_region_target_gene and get_tf_in_region from the package. This function will map a region to a target gene using three methods (mapping to the closest gene, mapping to any gene within a given window of distance, or mapping to a fixed number of nearby genes upstream or downstream). To find TFs binding to the region, JASPAR2020 is used.

Usage

create_triplet_distance_based(
  region,
  genome = c("hg38", "hg19"),
  target.method = c("genes.promoter.overlap", "window", "nearby.genes", "closest.gene"),
  target.window.size = 500 * 10^3,
  target.num.flanking.genes = 5,
  target.promoter.upstream.dist.tss = 2000,
  target.promoter.downstream.dist.tss = 2000,
  target.rm.promoter.regions.from.distal.linking = TRUE,
  motif.search.window.size = 0,
  motif.search.p.cutoff = 1e-08,
  TF.peaks.gr = NULL,
  max.distance.region.target = 10^6,
  cores = 1
)

Arguments

region

A Granges or a named vector with regions (i.e "chr21:100002-1004000")

genome

Human genome reference "hg38" or "hg19"

target.method

How genes are mapped to regions: regions overlapping gene promoter ("genes.promoter.overlap"); genes within a window around the region ("window"); or fixed number of nearby genes upstream and downstream from the region

target.window.size

When method = "window", number of base pairs to extend the region (+- window.size/2). Default is 500kbp (or +/- 250kbp, i.e. 250k bp from start or end of the region)

target.num.flanking.genes

Number of flanking genes upstream and downstream to search. For example, if target.num.flanking.genes = 5, it will return the 5 genes upstream and 5 genes downstream

target.promoter.upstream.dist.tss

Number of base pairs (bp) upstream of TSS to consider as promoter regions. Defaults to 2000 bp.

target.promoter.downstream.dist.tss

Number of base pairs (bp) downstream of TSS to consider as promoter regions. Defaults to 2000 bp.

target.rm.promoter.regions.from.distal.linking

When performing distal linking with method = "windows" or method = "nearby.genes", or "closest.gene.tss", if set to TRUE (default), probes in promoter regions will be removed from the input.

motif.search.window.size

Integer value to extend the regions. For example, a value of 50 will extend 25 bp upstream and 25 downstream the region. Default is no increase

motif.search.p.cutoff

motifmatchr pvalue cut-off. Default 1e-8.

TF.peaks.gr

A granges with TF peaks to be overlaped with input region Metadata column expected "id" with TF name. Default NULL. Note that Remap catalog can be used as shown in the examples.

max.distance.region.target

Max distance between region and target gene. Default 1Mbp.

cores

Number of CPU cores to be used. Default 1.

Value

A data frame with TF, target and RegionID information.

Examples

regions.names <- c("chr3:189631389-189632889","chr4:43162098-43163498")
triplet <- create_triplet_distance_based(
   region = regions.names,
   motif.search.window.size = 500,
   target.method = "closest.gene"
)

TransBioInfoLab/MethReg documentation built on July 28, 2023, 9:17 p.m.