compute_kmer_enrichment: _k_-mer Enrichment between Foreground and Background Sets

Description Usage Arguments Details Value See Also Examples

View source: R/k-mer-based.R

Description

Compares foreground sequence set to background sequence set and computes enrichment values for each possible k-mer.

Usage

1
2
3
4
5
6
7
compute_kmer_enrichment(
  foreground_kmers,
  background_kmers,
  permutation = FALSE,
  chisq_p_value_threshold = 0.05,
  p_adjust_method = "BH"
)

Arguments

foreground_kmers

k-mer counts of the foreground set (generated by generate_kmers)

background_kmers

k-mer counts of the background set (generated by generate_kmers)

permutation

if TRUE, only the enrichment value is returned (efficiency mode used for permutation testing)

chisq_p_value_threshold

threshold below which Fisher's exact test is used instead of Pearson's chi-squared test

p_adjust_method

see p.adjust

Details

Usually uses Pearson's chi-squared test, but recalculates p-values with Fisher's exact test for Pearson's chi-squared test p-values <= chisq_p_value_threshold. The reason this is done is computational efficiency. Fisher's exact tests are computationally demanding and are only performed in situations, where exact p-values are preferred, e.g., if expected < 5 or significant p-values.

Value

enrichment of k-mers in specified foreground sequences. A data frame with the following columns is returned:

foreground_count foreground counts for each k-mer
background_count background counts for each k-mer
enrichment k-mer enrichment
p_value p-value of k-mer enrichment (either from Fisher's exact test or Pearson's chi-squared test)
adj_p_value multiple testing corrected p-value

See Also

Other k-mer functions: calculate_kmer_enrichment(), check_kmers(), count_homopolymer_corrected_kmers(), draw_volcano_plot(), estimate_significance_core(), estimate_significance(), generate_kmers(), generate_permuted_enrichments(), run_kmer_spma(), run_kmer_tsma()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# define simple sequence sets for foreground and background
foreground_set <- c(
  "CAACAGCCUUAAUU", "CAGUCAAGACUCC", "CUUUGGGGAAU",
  "UCAUUUUAUUAAA", "AAUUGGUGUCUGGAUACUUCCCUGUACAU",
  "AUCAAAUUA", "AGAU", "GACACUUAAAGAUCCU",
  "UAGCAUUAACUUAAUG", "AUGGA", "GAAGAGUGCUCA",
  "AUAGAC", "AGUUC", "CCAGUAA"
)
background_set <- c(
  "CAACAGCCUUAAUU", "CAGUCAAGACUCC", "CUUUGGGGAAU",
  "UCAUUUUAUUAAA", "AAUUGGUGUCUGGAUACUUCCCUGUACAU",
  "AUCAAAUUA", "AGAU", "GACACUUAAAGAUCCU",
  "UAGCAUUAACUUAAUG", "AUGGA", "GAAGAGUGCUCA",
  "AUAGAC", "AGUUC", "CCAGUAA",
  "UUAUUUA", "AUCCUUUACA", "UUUUUUU", "UUUCAUCAUU",
  "CCACACAC", "CUCAUUGGAG", "ACUUUGGGACA", "CAGGUCAGCA"
)
foreground_kmers <- generate_kmers(foreground_set, 6)
background_kmers <- generate_kmers(background_set, 6)


kmer_enrichment_values <- compute_kmer_enrichment(foreground_kmers,
  background_kmers)

transite documentation built on Nov. 8, 2020, 5:27 p.m.