score_transcripts_single_motif: Scores transadsadscripts with position weight matrices

Description Usage Arguments Value See Also

View source: R/matrix-based.R

Description

This function is used to count the putative binding sites (i.e., motifs) in a set of sequences for the specified RNA-binding protein sequence motifs and returns the result in a data frame, which is aggregated by score_transcripts and subsequently used by calculate_motif_enrichment to obtain binding site enrichment scores.

Usage

1
2
3
4
5
6
7
8
score_transcripts_single_motif(
  motif,
  sequences,
  max_hits = 5,
  threshold_method = c("p_value", "relative"),
  threshold_value = 0.25^6,
  cache_path = paste0(tempdir(), "/sc/")
)

Arguments

motif

a Transite motif that is used to score the specified sequences

sequences

character vector of named sequences (only containing upper case characters A, C, G, T), where the names are RefSeq identifiers and sequence type qualifiers ("3UTR", "5UTR", "mRNA"), e.g. "NM_010356|3UTR"

max_hits

maximum number of putative binding sites per mRNA that are counted

threshold_method

either "p_value" (default) or "relative". If threshold_method equals "p_value", the default threshold_value is 0.25^6, which is lowest p-value that can be achieved by hexamer motifs, the shortest supported motifs. If threshold_method equals "relative", the default threshold_value is 0.9, which is 90% of the maximum PWM score.

threshold_value

semantics of the threshold_value depend on threshold_method (default is 0.25^6)

cache_path

the path to a directory where scores are cached. The scores of each motif are stored in a separate file that contains a hash table with RefSeq identifiers and sequence type qualifiers as keys and the number of binding sites as values. If is.null(cache_path), scores will not be cached.

Value

A list with the following items:

motif_id the motif identifier of the specified motif
motif_rbps the gene symbol of the RNA-binding protein(s)
absolute_hits the absolute frequency of binding sites per motif in all transcripts
relative_hits the relative, i.e., absolute divided by total, frequency of binding sites per motif in all transcripts
total_sites the total number of potential binding sites
one_hit, two_hits, ... number of transcripts with one, two, three, ... binding sites

See Also

Other matrix functions: calculate_motif_enrichment(), run_matrix_spma(), run_matrix_tsma(), score_transcripts()


transite documentation built on Nov. 8, 2020, 5:27 p.m.