find_ep_coenrichment: Find co-enriched motif pairs in enhancer-promoter...

View source: R/find_ep_coenrichment.R

find_ep_coenrichmentR Documentation

Find co-enriched motif pairs in enhancer-promoter interactions

Description

Identifies co-enriched pairs of motifs in enhancer-promoter interactions selected from a data frame of general genomic interactions.

If identify_ep: Promoters and enhancers are identified using genomic annotations, where anchors close to promoter annotations (within 2500 base pairs) are considered promoters and all other anchors are considered gene-distal enhancers. Only interactions in int_raw_data between promoters and enhancers are used for motif co-enrichment analysis.

If !identify_ep: Instead of automatically identifying promoters and enhancers based on genomic annotations, all interactions in int_raw_data must be preprocessed in a way that anchor 1 contains promoters and anchor 2 contains enhancers. Motif co-enrichment analysis is performed under this assumption.

Calls functions scan_motifs, filter_motifs, and anchor_pair_enrich internally.

Usage

find_ep_coenrichment(
  int_raw_data,
  motifs_file,
  motifs_file_matrix_format = c("pfm", "ppm", "pwm"),
  genome_id = c("hg38", "hg19", "mm9", "mm10"),
  identify_ep = TRUE,
  cooccurrence_method = c("count", "score", "match"),
  filter_threshold = 0.4
)

Arguments

int_raw_data

a GenomicInteractions object or a data frame with at least six columns:

column 1: character; genomic location of interaction anchor 1 - chromosome (e.g., "chr3")
column 2: integer; genomic location of interaction anchor 1 - start coordinate
column 3: integer; genomic location of interaction anchor 1 - end coordinate
column 4: character; genomic location of interaction anchor 2 - chromosome (e.g., "chr3")
column 5: integer; genomic location of interaction anchor 2 - start coordinate
column 6: integer; genomic location of interaction anchor 2 - end coordinate
motifs_file

JASPAR format matrix file containing multiple motifs to scan for, gz-zipped files allowed

motifs_file_matrix_format

type of position-specific scoring matrices in motifs_file, valid options include:

pfm: position frequency matrix, elements are absolute frequencies, i.e., counts (default)
ppm: position probability matrix, elements are probabilities, i.e., Laplace smoothing corrected relative frequencies
pwm: position weight matrix, elements are log likelihoods
genome_id

ID of genome assembly interactions in int_raw_data were aligned to, valid options include hg19, hg38, mm9, and mm10, defaults to hg38

identify_ep

logical, set FALSE if enhancers and promoters should not be identified based on genomic annotations, but instead assumes anchor 1 contains promoters and anchor 2 contains enhancers, for all interactions in int_raw_data, defaults to TRUE, i.e., do identify enhancers and promoters of interactions in int_raw_data based on genomic interactions and filter all interactions which are not between promoters and enhancers

cooccurrence_method

method for co-occurrence, valid options include:

count: correlation between counts (for each anchor, tally positions where motif score > 5 * 10^{-5})
score: correlation between motif scores (for each anchor, use the maximum score over all positions)
match: association between motif matches (for each anchor, a match is defined if the is at least one position with a motif score > 5 * 10^{-5})

See anchor_pair_enrich for details.

filter_threshold

fraction of interactions that should contain a motif for a motif to be considered, see filter_motifs, defaults to 0.4

Value

a list with the following items:

int_data GenomicInteractions object; promoter-enhancer interactions
int_data_motifs: interactionData object; return value of scan_motifs
filtered_int_data_motifs: interactionData object; return value of filter_motifs
annotation_pie_chart: ggplot2 plot; return value of plotInteractionAnnotations
motif_cooccurrence: interactionData object; return value of anchor_pair_enrich

Author(s)

Jennifer Hammelman

Konstantin Krismer

Examples

## Not run: 
interactions_file <- system.file("extdata/yy1_interactions.bedpe.gz",
                                 package = "spatzie")
motifs_file <- system.file("extdata/motifs_subset.txt.gz",
                           package = "spatzie")

df <- read.table(gzfile(interactions_file), header = TRUE, sep = "\t")
res <- find_ep_coenrichment(df, motifs_file,
                            motifs_file_matrix_format = "pfm",
                            genome_id = "mm10")

## End(Not run)


jhammelman/spatzie documentation built on Feb. 8, 2024, 8:50 a.m.