calculate_motif_enrichment: Binding Site Enrichment Value Calculation

Description Usage Arguments Value See Also Examples

View source: R/matrix-based.R

Description

This function is used to calculate binding site enrichment / depletion scores between predefined foreground and background sequence sets. Significance levels of enrichment values are obtained by Monte Carlo tests.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
calculate_motif_enrichment(
  foreground_scores_df,
  background_scores_df,
  background_total_sites,
  background_absolute_hits,
  n_transcripts_foreground,
  max_fg_permutations = 1e+06,
  min_fg_permutations = 1000,
  e = 5,
  p_adjust_method = "BH"
)

Arguments

foreground_scores_df

result of score_transcripts on foreground sequence set (foreground sequence sets must be a subset of the background sequence set)

background_scores_df

result of score_transcripts on background sequence set

background_total_sites

number of potential binding sites per sequence (returned by score_transcripts)

background_absolute_hits

number of putative binding sites per sequence (returned by score_transcripts)

n_transcripts_foreground

number of sequences in the foreground set

max_fg_permutations

maximum number of foreground permutations performed in Monte Carlo test for enrichment score

min_fg_permutations

minimum number of foreground permutations performed in Monte Carlo test for enrichment score

e

integer-valued stop criterion for enrichment score Monte Carlo test: aborting permutation process after observing e random enrichment values with more extreme values than the actual enrichment value

p_adjust_method

adjustment of p-values from Monte Carlo tests to avoid alpha error accumulation, see p.adjust

Value

A data frame with the following columns:

motif_id the motif identifier that is used in the original motif library
motif_rbps the gene symbol of the RNA-binding protein(s)
enrichment binding site enrichment between foreground and background sequences
p_value unadjusted p-value from Monte Carlo test
p_value_n number of Monte Carlo test permutations
adj_p_value adjusted p-value from Monte Carlo test (usually FDR)

See Also

Other matrix functions: run_matrix_spma(), run_matrix_tsma(), score_transcripts_single_motif(), score_transcripts()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
foreground_seqs <- c("CAGUCAAGACUCC", "AAUUGGUGUCUGGAUACUUCCCUGUACAU",
  "AGAU", "CCAGUAA")
background_seqs <- c(foreground_seqs, "CAACAGCCUUAAUU", "CUUUGGGGAAU",
                     "UCAUUUUAUUAAA", "AUCAAAUUA", "GACACUUAAAGAUCCU",
                     "UAGCAUUAACUUAAUG", "AUGGA", "GAAGAGUGCUCA",
                     "AUAGAC", "AGUUC")
foreground_scores <- score_transcripts(foreground_seqs, cache = FALSE)
background_scores <- score_transcripts(background_seqs, cache = FALSE)
enrichments_df <- calculate_motif_enrichment(foreground_scores$df,
  background_scores$df,
  background_scores$total_sites, background_scores$absolute_hits,
  length(foreground_seqs),
  max_fg_permutations = 1000
)

transite documentation built on Nov. 8, 2020, 5:27 p.m.