score_transcripts | R Documentation |
This function is used to count the binding sites in a set of sequences for
all or a
subset of RNA-binding protein sequence
motifs and returns the result in a data frame, which is subsequently used by
calculate_motif_enrichment
to
obtain binding site enrichment scores.
score_transcripts(
sequences,
motifs = NULL,
max_hits = 5,
threshold_method = c("p_value", "relative"),
threshold_value = 0.25^6,
n_cores = 1,
cache = paste0(tempdir(), "/sc/")
)
sequences |
character vector of named sequences
(only containing upper case characters A, C, G, T), where the names are
RefSeq identifiers
and sequence
type qualifiers ( |
motifs |
a list of motifs that is used to score the specified sequences.
If |
max_hits |
maximum number of putative binding sites per mRNA that are counted |
threshold_method |
either |
threshold_value |
semantics of the |
n_cores |
the number of cores that are used |
cache |
either logical or path to a directory where scores are cached.
The scores of each
motif are stored in a
separate file that contains a hash table with RefSeq identifiers and
sequence type
qualifiers as keys and the number of putative binding sites as values.
If |
A list with three entries:
(1) df: a data frame with the following columns:
motif_id | the motif identifier that is used in the original motif library |
motif_rbps | the gene symbol of the RNA-binding protein(s) |
absolute_hits | the absolute frequency of putative binding sites per motif in all transcripts |
relative_hits | the relative, i.e., absolute divided by total, frequency of binding sites per motif in all transcripts |
total_sites | the total number of potential binding sites |
one_hit , two_hits , ... | number of transcripts with one, two, three, ... putative binding sites |
(2) total_sites: a numeric vector with the total number of potential binding sites per transcript
(3) absolute_hits: a numeric vector with the absolute (not relative) number of putative binding sites per transcript
Other matrix functions:
calculate_motif_enrichment()
,
run_matrix_spma()
,
run_matrix_tsma()
,
score_transcripts_single_motif()
foreground_set <- c(
"CAACAGCCUUAAUU", "CAGUCAAGACUCC", "CUUUGGGGAAU",
"UCAUUUUAUUAAA", "AAUUGGUGUCUGGAUACUUCCCUGUACAU",
"AUCAAAUUA", "AGAU", "GACACUUAAAGAUCCU",
"UAGCAUUAACUUAAUG", "AUGGA", "GAAGAGUGCUCA",
"AUAGAC", "AGUUC", "CCAGUAA"
)
# names are used as keys in the hash table (cached version only)
# ideally sequence identifiers (e.g., RefSeq ids) and region labels
# (e.g., 3UTR for 3'-UTR)
names(foreground_set) <- c(
"NM_1_DUMMY|3UTR", "NM_2_DUMMY|3UTR", "NM_3_DUMMY|3UTR",
"NM_4_DUMMY|3UTR", "NM_5_DUMMY|3UTR", "NM_6_DUMMY|3UTR",
"NM_7_DUMMY|3UTR", "NM_8_DUMMY|3UTR", "NM_9_DUMMY|3UTR",
"NM_10_DUMMY|3UTR", "NM_11_DUMMY|3UTR", "NM_12_DUMMY|3UTR",
"NM_13_DUMMY|3UTR", "NM_14_DUMMY|3UTR"
)
# specific motifs, uncached
motifs <- get_motif_by_rbp("ELAVL1")
scores <- score_transcripts(foreground_set, motifs = motifs, cache = FALSE)
## Not run:
# all Transite motifs, cached (writes scores to disk)
scores <- score_transcripts(foreground_set)
# all Transite motifs, uncached
scores <- score_transcripts(foreground_set, cache = FALSE)
foreground_df <- transite:::ge$foreground1_df
foreground_set <- foreground_df$seq
names(foreground_set) <- paste0(foreground_df$refseq, "|",
foreground_df$seq_type)
scores <- score_transcripts(foreground_set)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.