test_DTU: Perform differential splicing

View source: R/run.R

test_DTUR Documentation

Perform differential splicing

Description

test_DTU performs differential splicing, via differential transcript usage (DTU), between 2 or more groups. Parameters are inferred via Markov chain Monte Carlo (MCMC) techniques and a DTU test is performed via a multivariate Wald test on the posterior densities for the average relative abundance of transcripts. Warning: the samples in samples_design must have the same order as those in the 'path_to_eq_classes' parameter of the create_data function.

Usage

test_DTU(
  BANDITS_data,
  precision = NULL,
  R = 10^4,
  burn_in = 2 * 10^3,
  samples_design,
  group_col_name = "group",
  n_cores = 1,
  gene_to_transcript,
  theshold_pval = 0.1
)

Arguments

BANDITS_data

a 'BANDITS_data' object.

precision

a vector with the mean and standard deviation of the log-precision parameter.

R

the number of iterations for the MCMC algorithm (after the burn-in). Min 10^4. Albeit no difference was observed in simulation studies when increasing 'R' above 10^4, we encourage users to possibly use higher values of R (e.g., 2*10^4), if the computational time allows it, particularly for comparisons between 3 or more groups.

burn_in

the length of the burn-in to be discarded (before convergence is reached). Min 2*10^3. Albeit no difference was observed in simulation studies when increasing 'burn_in' above 2*10^3, we encourage users to possibly use higher values of R (e.g., double) if the computational time allows it.

samples_design

a data.frame indicating the design of the experiment with one row for each sample: samples_design must contain a column with the sample id and one with the group id. Warning: the samples in samples_design must have the same order as those in the 'path_to_eq_classes' parameter of the create_data function.

group_col_name

the name of the column of 'samples_design' containing the group id. By default group_col_name = "group".

n_cores

the number of cores to parallelize the tasks on.

gene_to_transcript

a matrix or data.frame with a list of gene-to-transcript correspondances. The first column represents the gene id, while the second one contains the transcript id.

theshold_pval

is a threshold between 0 and 1; when running test_DTU, if the p.value of a gene is < theshold_pval, a second (independent) MCMC chain is run and the p.value is re-computed on the aggregation of the two chains. By defauls theshold_pval = 0.1, while theshold_pval = 1 corresponds to running all chains twice, and theshold_pval = 0 means all chains will only run once.

Value

A BANDITS_test object.

Author(s)

Simone Tiberi simone.tiberi@uzh.ch

See Also

create_data, BANDITS_data, BANDITS_test

Examples

# load gene_to_transcript matching:
data("gene_tr_id", package = "BANDITS")

# We define the design of the study
samples_design = data.frame(sample_id = paste0("sample", seq_len(4)),
                            group = c("A", "A", "B", "B"))

# load the pre-computed data:
data("input_data", package = "BANDITS")
input_data

# Filter lowly abundant genes:
input_data = filter_genes(input_data, min_counts_per_gene = 20)

# load the pre-computed precision estimates:
data(precision, package = "BANDITS")

## Test for DTU
set.seed(61217)
results = test_DTU(BANDITS_data = input_data,
                   precision = precision$prior,
                   samples_design = samples_design,
                   R = 10^4, burn_in = 2*10^3, n_cores = 2,
                   gene_to_transcript = gene_tr_id)
results


SimoneTiberi/BANDITS documentation built on Nov. 15, 2023, 2:35 p.m.