DifferentialRegulation: Discover differentially regulated genes from single-cell...
In SimoneTiberi/DifferentialRegulation: Differentially regulated genes from scRNA-seq data

DifferentialRegulation

R Documentation

Discover differentially regulated genes from single-cell RNA-seq data

Description

DifferentialRegulation identified differentially regulated genes between two conditions (e.g., healthy vs. disease or treated vs. untreated) in each cluster of cells. Parameters are inferred via Markov chain Monte Carlo (MCMC) techniques and a differential testing is performed via a multivariate Wald test on the posterior densities of the group-level USA (Unspliced, Spliced and Ambiguous) counts relative abundance.

Usage

DifferentialRegulation(
  PB_counts,
  n_cores = NULL,
  N_MCMC = 2000,
  burn_in = 500,
  undersampling_int = 10,
  traceplot = FALSE
)

Arguments

`PB_counts`	a `list`, computed via `compute_PB_counts`
`n_cores`	the number of cores to parallelize the tasks on. Since parallelization is at the cluster level (each cluster is parallelized on a thread), we suggest setting n_cores to the number of clusters (e.g., cell-types), as set by default if 'n_cores' is not specified.
`N_MCMC`	the number of iterations for the MCMC algorithm (including burn-in). Min 2*10^3. If our algorithm does not converge (according to Heidelberger and Welch's convergence diagnostic), we automatically double N_MCMC and burn_in, and run it a second time (a message will be printed on screen to inform users).
`burn_in`	the length of the burn-in; i.e., the initial part of the MCMC chain to be discarded (before convergence is reached). Min 500. If no convergence is reached, the 'burn_in' is automatically increased (up to N_MCMC/2) according to the convergence detected by Heidelberger and Welch's convergence diagnostic. If our algorithm does not converge even after increasing the burn-in, we automatically double N_MCMC and burn_in, and run it a second time (a message will be printed on screen to inform users).
`undersampling_int`	the undersampling of the latent variables. While model parameters are sampled at each iteration, RNA-seq counts are allocated to their transcript (and spliced/unspliced) version of origin, every 'undersampling_int' iterations. Increasing 'undersampling_int' will decrease the runtime, but may marginally affect performance. In our benchmarks, no differences in performance were observed for values up to 10.
`traceplot`	a logical value indicating whether to return the posterior chain of "pi_U", for both groups (i.e., the group-level relative abundance of unspliced reads). If TRUE, the posterior chains are stored in 'MCMC_U' object, and can be plotted via 'plot_traceplot' function.

Value

A list of 4 data.frame objects. 'Differential_results' contains results from differential testing only; 'US_results' has results for the proportion of Spliced and Unspliced counts (Ambiguous counts are allocated 50:50 to Spliced and Unspliced); 'USA_results' includes results for the proportion of Spliced, Unspliced and Ambiguous counts (Ambiguous counts are reported separately from Spliced and Unspliced counts); 'Convergence_results' contains information about convergence of posterior chains. Columns 'Gene_id' and 'Cluster_id' contain the gene and cell-cluster name, while 'p_val', 'p_adj.loc' and 'p_adj.glb' report the raw p-values, locally and globally adjusted p-values, via Benjamini and Hochberg (BH) correction. In locally adjusted p-values ('p_adj.loc') BH correction is applied to each cluster separately, while in globally adjusted p-values ('p_adj.glb') BH correction is performed to the results from all clusters. Columns 'pi' and 'sd' indicate the proportion and standard deviation, respectively, 'S', 'U' and 'A' refer to Spliced, Unspliced and Ambiguous counts, respectively, while 'gr_A' and 'gr_B' refer to group A and B, respectively. For instance, columns 'pi_S-gr_A' and 'sd_S-gr_A' indicate the estimates and standard deviation (sd) for the proportion of Spliced (pi_S) and Unspliced (pi_U) counts in group A, respectively.

Author(s)

Simone Tiberi simone.tiberi@unibo.it

Examples

# load internal data to the package:
data_dir = system.file("extdata", package = "DifferentialRegulation")

# specify samples ids:
sample_ids = paste0("organoid", c(1:3, 16:18))
# set directories of each sample input data (obtained via alevin-fry):
base_dir = file.path(data_dir, "alevin-fry", sample_ids)
file.exists(base_dir)

# set paths to USA counts, cell id and gene id:
# Note that alevin-fry needs to be run with '--use-mtx' option
# to store counts in a 'quants_mat.mtx' file.
path_to_counts = file.path(base_dir,"/alevin/quants_mat.mtx")
path_to_cell_id = file.path(base_dir,"/alevin/quants_mat_rows.txt")
path_to_gene_id = file.path(base_dir,"/alevin/quants_mat_cols.txt")

# load USA counts:
sce = load_USA(path_to_counts,
               path_to_cell_id,
               path_to_gene_id,
               sample_ids)
 
# define the design of the study:
design = data.frame(sample = sample_ids,
                    group = c( rep("3 mon", 3), rep("6 mon", 3) ))
design

# cell types should be assigned to each cell;
# here we load pre-computed cell types:
path_to_DF = file.path(data_dir,"DF_cell_types.txt")
DF_cell_types = read.csv(path_to_DF, sep = "\t", header = TRUE)
matches = match(colnames(sce), DF_cell_types$cell_id)
sce$cell_type = DF_cell_types$cell_type[matches]

# set paths to EC counts and ECs:
path_to_EC_counts = file.path(base_dir,"/alevin/geqc_counts.mtx")
path_to_EC = file.path(base_dir,"/alevin/gene_eqclass.txt.gz")

# load EC counts:
EC_list = load_EC(path_to_EC_counts,
                  path_to_EC,
                  path_to_cell_id,
                  path_to_gene_id,
                  sample_ids)
                    
PB_counts = compute_PB_counts(sce = sce,
                              EC_list = EC_list,
                              design =  design,
                              sample_col_name = "sample",
                              group_col_name = "group",
                              sce_cluster_name = "cell_type",
                              min_cells_per_cluster = 100, 
                              min_counts_per_gene_per_group = 20)

# to reduce memory usage, we can remove the EC_list object:
rm(EC_list)
  
set.seed(1609612) 
results = DifferentialRegulation(PB_counts,
                                 n_cores = 2,
                                 traceplot = TRUE)
  
names(results)
  
# We visualize differential results:
head(results$Differential_results)

# plot top (i.e., most significant) result:
# plot USA proportions:
plot_pi(results,
        type = "USA",
        gene_id = results$Differential_results$Gene_id[1],
        cluster_id = results$Differential_results$Cluster_id[1])

# plot US proportions:
plot_pi(results,
        type = "US",
        gene_id = results$Differential_results$Gene_id[1],
        cluster_id = results$Differential_results$Cluster_id[1])
       
# plot the corresponding traceplot:
plot_traceplot(results,
               gene_id = results$Differential_results$Gene_id[1],
               cluster_id = results$Differential_results$Cluster_id[1])

SimoneTiberi/DifferentialRegulation documentation built on Aug. 22, 2024, 7:43 a.m.

SimoneTiberi/DifferentialRegulation index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

SimoneTiberi/DifferentialRegulation
Differentially regulated genes from scRNA-seq data

DifferentialRegulation: Discover differentially regulated genes from single-cell...
In SimoneTiberi/DifferentialRegulation: Differentially regulated genes from scRNA-seq data

Discover differentially regulated genes from single-cell RNA-seq data

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Related to DifferentialRegulation in SimoneTiberi/DifferentialRegulation...

R Package Documentation

Browse R Packages

We want your feedback!

SimoneTiberi/DifferentialRegulation Differentially regulated genes from scRNA-seq data

DifferentialRegulation: Discover differentially regulated genes from single-cell... In SimoneTiberi/DifferentialRegulation: Differentially regulated genes from scRNA-seq data

Discover differentially regulated genes from single-cell RNA-seq data

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Related to DifferentialRegulation in SimoneTiberi/DifferentialRegulation...

R Package Documentation

Browse R Packages

We want your feedback!

SimoneTiberi/DifferentialRegulation
Differentially regulated genes from scRNA-seq data

DifferentialRegulation: Discover differentially regulated genes from single-cell...
In SimoneTiberi/DifferentialRegulation: Differentially regulated genes from scRNA-seq data