find_candidates: Determining which SNPs are candidates for allele specific...

Description Usage Arguments Value Examples

Description

This function helps users identify which genetic variants are candidates for allele-specific translation (i.e. may lead to differential translation) by using RNA-Seq and Ribosome profiling data from all heterozygous SNPs in the transcriptome of a given individual.

Usage

1
2
find_candidates(RNA.file, RIBO.file, output.csv, output.pdf, alpha = 0.05,
  pcounts = 50, c.threshold = 0.9, d.threshold = 0.1, num.sims = 1000)

Arguments

RNA.file

Filename of the RNA-Seq counts for each SNP. Each row is either the maternal or paternal allele of a SNP and its associated number of RNA-Seq reads at each position. The number of columns is defined by the size of the window (i.e. for RNA, usually around 75 nucleotides.

RIBO.file

Filename of the Ribosome profiling counts for each SNP. See RNA.file for more details.

output.csv

Filename of the output csv file where the numerical output (effect sizes and p values) will be stored.

output.pdf

Filename of the output pdf file where the graphical output (histograms of the RNA-Seq and ribosome profiling bootstrapped ratio distributions) will be stored

alpha

Threshold for what p-values are considered significant. Default is 0.05.

pcounts

Pseudocounts for regularization when calculating the maternal / total ratio counts (for each bootstrapped sample) so tha the RNA-Seq ratio and ribosomal profiling ratio can be comparable. Default value is 50. Should be a value between ~30 (the window size ribosome profiling) and ~75 (the window size for RNA-seq).

c.threshold

Threshold for what Cliff's Delta values are considered significant. Default is 0.9.

d.threshold

Threshold for what secondary effect size difference is significant. Default is 0.1 (i.e. a minimum of a 10 of the RNA-seq counts and ribosome profiling counts).

num.sims

Number of times to bootstrap. Default is 1000.

Value

Two files. 1) A .csv file with all candidate SNPs and their associated p-values, Cliff's Delta, and secondary effect size values (9 columns total). 2) A .pdf file with a histogram of the bootstrapped ratio distributions for RNA-Seq (red) and ribosome profiling (blue) for each candidate SNP. This makes it possible to easily visualize how much overlap there are between the two distributions.

Examples

1
2
find_candidates("RNA.all.csv", "RIBO.all.csv", "final.effsize.csv", "final.plots.pdf")
find_candidates("RNA.all.csv", "RIBO.all.csv", "final.effsize.csv", "final.plots.pdf", num.sims = 5000, c.threshold = 0.5)

angelalica/ASTranslation documentation built on May 10, 2019, 11:46 a.m.