run_kmer_spma: _k_-mer-based Spectrum Motif Analysis
In kkrismer/transite: RNA-binding protein motif analysis

run_kmer_spma

R Documentation

k-mer-based Spectrum Motif Analysis

Description

SPMA helps to illuminate the relationship between RBP binding evidence and the transcript sorting criterion, e.g., fold change between treatment and control samples.

Usage

run_kmer_spma(
  sorted_transcript_sequences,
  sorted_transcript_values = NULL,
  transcript_values_label = "transcript value",
  motifs = NULL,
  k = 6,
  n_bins = 40,
  midpoint = 0,
  x_value_limits = NULL,
  max_model_degree = 1,
  max_cs_permutations = 1e+07,
  min_cs_permutations = 5000,
  fg_permutations = 5000,
  p_adjust_method = "BH",
  p_combining_method = "fisher",
  n_cores = 1
)

Arguments

`sorted_transcript_sequences`	character vector of ranked sequences, either DNA (only containing upper case characters A, C, G, T) or RNA (A, C, G, U). The sequences in `sorted_transcript_sequences` must be ranked (i.e., sorted). Commonly used sorting criteria are measures of differential expression, such as fold change or signal-to-noise ratio (e.g., between treatment and control samples in gene expression profiling experiments).
`sorted_transcript_values`	vector of sorted transcript values, i.e., the fold change or signal-to-noise ratio or any other quantity that was used to sort the transcripts that were passed to `run_matrix_spma` or `run_kmer_spma` (default value is `NULL`). These values are displayed as a semi-transparent area over the enrichment value heatmaps of spectrum plots.
`transcript_values_label`	label of transcript sorting criterion (e.g., `"log fold change"`, default value is `"transcript value"`), only shown if `!is.null(sorted_transcript_values)`
`motifs`	a list of motifs that is used to score the specified sequences. If `is.null(motifs)` then all Transite motifs are used.
`k`	length of k-mer, either `6` for hexamers or `7` for heptamers
`n_bins`	specifies the number of bins in which the sequences will be divided, valid values are between 7 and 100
`midpoint`	for enrichment values the midpoint should be `1`, for log enrichment values `0` (defaults to `0`)
`x_value_limits`	sets limits of the x-value color scale (used to harmonize color scales of different spectrum plots), see `limits` argument of `continuous_scale` (defaults to `NULL`, i.e., the data-dependent default scale range)
`max_model_degree`	maximum degree of polynomial
`max_cs_permutations`	maximum number of permutations performed in Monte Carlo test for consistency score
`min_cs_permutations`	minimum number of permutations performed in Monte Carlo test for consistency score
`fg_permutations`	numer of foreground permutations
`p_adjust_method`	see `p.adjust`
`p_combining_method`	one of the following: Fisher (1932) (`"fisher"`), Stouffer (1949), Liptak (1958) (`"SL"`), Mudholkar and George (1979) (`"MG"`), and Tippett (1931) (`"tippett"`) (see `p_combine`)
`n_cores`	number of computing cores to use

Details

In order to investigate how motif targets are distributed across a spectrum of transcripts (e.g., all transcripts of a platform, ordered by fold change), Spectrum Motif Analysis visualizes the gradient of RBP binding evidence across all transcripts.

The k-mer-based approach differs from the matrix-based approach by how the sequences are scored. Here, sequences are broken into k-mers, i.e., oligonucleotide sequences of k bases. And only statistically significantly enriched or depleted k-mers are then used to calculate a score for each RNA-binding protein, which quantifies its target overrepresentation.

Value

A list with the following components:

`foreground_scores`	the result of `run_kmer_tsma` for the binned data
`spectrum_info_df`	a data frame with the SPMA results
`spectrum_plots`	a list of spectrum plots, as generated by `score_spectrum`
`classifier_scores`	a list of classifier scores, as returned by `classify_spectrum`

Examples

# example data set
background_df <- transite:::ge$background_df
# sort sequences by signal-to-noise ratio
background_df <- dplyr::arrange(background_df, value)
# character vector of named and ranked (by signal-to-noise ratio) sequences
background_seqs <- gsub("T", "U", background_df$seq)
names(background_seqs) <- paste0(background_df$refseq, "|",
  background_df$seq_type)

results <- run_kmer_spma(background_seqs,
                         sorted_transcript_values = background_df$value,
                         transcript_values_label = "signal-to-noise ratio",
                         motifs = get_motif_by_id("M178_0.6"),
                         n_bins = 20,
                         fg_permutations = 10)

## Not run: 
results <- run_kmer_spma(background_seqs,
                         sorted_transcript_values = background_df$value,
                         transcript_values_label = "signal-to-noise ratio")
## End(Not run)

kkrismer/transite documentation built on July 13, 2024, 8:01 a.m.