Description Usage Arguments Value See Also Examples
This function is used to count the binding sites in a set of sequences for
all or a
subset of RNA-binding protein sequence
motifs and returns the result in a data frame, which is subsequently used by
calculate_motif_enrichment to
obtain binding site enrichment scores.
1 2 3 4 5 6 7 8 9  | 
sequences | 
 character vector of named sequences
(only containing upper case characters A, C, G, T), where the names are
RefSeq identifiers
and sequence
type qualifiers (  | 
motifs | 
 a list of motifs that is used to score the specified sequences.
If   | 
max_hits | 
 maximum number of putative binding sites per mRNA that are counted  | 
threshold_method | 
 either   | 
threshold_value | 
 semantics of the   | 
n_cores | 
 the number of cores that are used  | 
cache | 
 either logical or path to a directory where scores are cached.
The scores of each
motif are stored in a
separate file that contains a hash table with RefSeq identifiers and
sequence type
qualifiers as keys and the number of putative binding sites as values.
If   | 
A list with three entries:
(1) df: a data frame with the following columns:
motif_id  | the motif identifier that is used in the original motif library | 
motif_rbps  | the gene symbol of the RNA-binding protein(s) | 
absolute_hits  | the absolute frequency of putative binding sites per motif in all transcripts | 
relative_hits  | the relative, i.e., absolute divided by total, frequency of binding sites per motif in all transcripts | 
total_sites  | the total number of potential binding sites | 
one_hit, two_hits, ...  | number of transcripts with one, two, three, ... putative binding sites | 
(2) total_sites: a numeric vector with the total number of potential binding sites per transcript
(3) absolute_hits: a numeric vector with the absolute (not relative) number of putative binding sites per transcript
Other matrix functions: 
calculate_motif_enrichment(),
run_matrix_spma(),
run_matrix_tsma(),
score_transcripts_single_motif()
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35  | foreground_set <- c(
  "CAACAGCCUUAAUU", "CAGUCAAGACUCC", "CUUUGGGGAAU",
  "UCAUUUUAUUAAA", "AAUUGGUGUCUGGAUACUUCCCUGUACAU",
  "AUCAAAUUA", "AGAU", "GACACUUAAAGAUCCU",
  "UAGCAUUAACUUAAUG", "AUGGA", "GAAGAGUGCUCA",
  "AUAGAC", "AGUUC", "CCAGUAA"
)
# names are used as keys in the hash table (cached version only)
# ideally sequence identifiers (e.g., RefSeq ids) and region labels
# (e.g., 3UTR for 3'-UTR)
names(foreground_set) <- c(
  "NM_1_DUMMY|3UTR", "NM_2_DUMMY|3UTR", "NM_3_DUMMY|3UTR",
  "NM_4_DUMMY|3UTR", "NM_5_DUMMY|3UTR", "NM_6_DUMMY|3UTR",
  "NM_7_DUMMY|3UTR", "NM_8_DUMMY|3UTR", "NM_9_DUMMY|3UTR",
  "NM_10_DUMMY|3UTR", "NM_11_DUMMY|3UTR", "NM_12_DUMMY|3UTR",
  "NM_13_DUMMY|3UTR", "NM_14_DUMMY|3UTR"
)
# specific motifs, uncached
motifs <- get_motif_by_rbp("ELAVL1")
scores <- score_transcripts(foreground_set, motifs = motifs, cache = FALSE)
## Not run: 
# all Transite motifs, cached (writes scores to disk)
scores <- score_transcripts(foreground_set)
# all Transite motifs, uncached
scores <- score_transcripts(foreground_set, cache = FALSE)
foreground_df <- transite:::ge$foreground1_df
foreground_set <- foreground_df$seq
names(foreground_set) <- paste0(foreground_df$refseq, "|",
   foreground_df$seq_type)
scores <- score_transcripts(foreground_set)
## End(Not run)
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.