run_kmer_spma | R Documentation |
SPMA helps to illuminate the relationship between RBP binding evidence and the transcript sorting criterion, e.g., fold change between treatment and control samples.
run_kmer_spma(
sorted_transcript_sequences,
sorted_transcript_values = NULL,
transcript_values_label = "transcript value",
motifs = NULL,
k = 6,
n_bins = 40,
midpoint = 0,
x_value_limits = NULL,
max_model_degree = 1,
max_cs_permutations = 1e+07,
min_cs_permutations = 5000,
fg_permutations = 5000,
p_adjust_method = "BH",
p_combining_method = "fisher",
n_cores = 1
)
sorted_transcript_sequences |
character vector of ranked sequences,
either DNA
(only containing upper case characters A, C, G, T) or RNA (A, C, G, U).
The sequences in |
sorted_transcript_values |
vector of sorted transcript values, i.e.,
the fold change or signal-to-noise ratio or any other quantity that was used
to sort the transcripts that were passed to |
transcript_values_label |
label of transcript sorting criterion
(e.g., |
motifs |
a list of motifs that is used to score the specified sequences.
If |
k |
length of k-mer, either |
n_bins |
specifies the number of bins in which the sequences will be divided, valid values are between 7 and 100 |
midpoint |
for enrichment values the midpoint should be |
x_value_limits |
sets limits of the x-value color scale (used to
harmonize color scales of different spectrum plots), see |
max_model_degree |
maximum degree of polynomial |
max_cs_permutations |
maximum number of permutations performed in Monte Carlo test for consistency score |
min_cs_permutations |
minimum number of permutations performed in Monte Carlo test for consistency score |
fg_permutations |
numer of foreground permutations |
p_adjust_method |
see |
p_combining_method |
one of the following: Fisher (1932)
( |
n_cores |
number of computing cores to use |
In order to investigate how motif targets are distributed across a spectrum of transcripts (e.g., all transcripts of a platform, ordered by fold change), Spectrum Motif Analysis visualizes the gradient of RBP binding evidence across all transcripts.
The k-mer-based approach differs from the matrix-based approach by how the sequences are scored. Here, sequences are broken into k-mers, i.e., oligonucleotide sequences of k bases. And only statistically significantly enriched or depleted k-mers are then used to calculate a score for each RNA-binding protein, which quantifies its target overrepresentation.
A list with the following components:
foreground_scores | the result of run_kmer_tsma
for the binned data |
spectrum_info_df | a data frame with the SPMA results |
spectrum_plots | a list of spectrum plots, as generated by
score_spectrum |
classifier_scores | a list of classifier scores, as returned by
classify_spectrum
|
Other SPMA functions:
classify_spectrum()
,
run_matrix_spma()
,
score_spectrum()
,
subdivide_data()
Other k-mer functions:
calculate_kmer_enrichment()
,
check_kmers()
,
compute_kmer_enrichment()
,
count_homopolymer_corrected_kmers()
,
create_kmer_origin_list()
,
draw_volcano_plot()
,
estimate_significance()
,
estimate_significance_core()
,
generate_kmers()
,
generate_permuted_enrichments()
,
run_kmer_tsma()
# example data set
background_df <- transite:::ge$background_df
# sort sequences by signal-to-noise ratio
background_df <- dplyr::arrange(background_df, value)
# character vector of named and ranked (by signal-to-noise ratio) sequences
background_seqs <- gsub("T", "U", background_df$seq)
names(background_seqs) <- paste0(background_df$refseq, "|",
background_df$seq_type)
results <- run_kmer_spma(background_seqs,
sorted_transcript_values = background_df$value,
transcript_values_label = "signal-to-noise ratio",
motifs = get_motif_by_id("M178_0.6"),
n_bins = 20,
fg_permutations = 10)
## Not run:
results <- run_kmer_spma(background_seqs,
sorted_transcript_values = background_df$value,
transcript_values_label = "signal-to-noise ratio")
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.