cedar: Testing cell type specific differential signals for specified...

View source: R/cedar.R

cedarR Documentation

Testing cell type specific differential signals for specified phenotype by considering DE/DM state corrleation between cell types.

Description

This function provides posterior probability of whether a feature is DE/DM in certain cell type given observed bulk data.

Usage

cedar(Y_raw, prop, design.1, design.2=NULL, factor.to.test=NULL,
            pval = NULL, p.adj = NULL, tree = NULL, p.matrix.input = NULL,
            de.state = NULL, cutoff.tree = c('fdr', 0.01), 
            cutoff.prior.prob = c('pval', 0.01),
            similarity.function = NULL, parallel.core = NULL, corr.fig = FALSE, 
            run.time = TRUE, tree.type = c('single','full'))

Arguments

Y_raw

matrix of observed bulk data, with rows representing features and columns representing samples

prop

matrix of cell type composition of samples, with rows representing samples and columns representing cell types

design.1

covariates with cell type specific effect, with rows representing samples and columns representing covariates

design.2

covariates without cell type sepcific effect, with rows representing samples and columns representing covariates

factor.to.test

A phenotype name, e.g. "disease", or a vector of contrast terms, e.g. c("disease", "case", "control").

pval

matrix of p-values, with rows representing features and columns representing cell types. colnames must be same as input of prop

p.adj

matrix of adjusted p-values, with rows representing features and columns representing cell types. colnames must be same as input of prop

tree

tree structure between cell types, a matrix with row representing layers andcolumn representing cell types (column name is required)

p.matrix.input

prior probability on each node of the tree structure. only work when tree structure has been specified. the dimension must be same as tree input.

de.state

DE/DM state of each feature in each cell type, with row representing features and column representing cell types (1:DE/DM, 0:non-DE/DM)

cutoff.tree

cut off used to define DE state to estimate tree could be 'fdr' or 'pval' default it 'fdr'=0.01. suggest to start with restrictive cut off and change to relative loose value when the restrictive cut off is failed

cutoff.prior.prob

cut off used to define DE state to estimate prior probs of nodes on tree could be 'fdr' or 'pval' default it 'fdr'=0.01. suggest to start with restrictive cut off and change to relative loose value when the restrictive cut off is failed

similarity.function

custom function used to calculate similarity between cell types that used for tree structure estimation. the input of the custom is assumed to be a matrix of log transformed p-value. dimension is: selected gene number * cell number

parallel.core

number of cores for parallel running, default is NULL

corr.fig

a boolean value, whether to plot corrrelation between cell types use function plotCorr()

run.time

a boolean value, whether to report running time in seconds

tree.type

tree type for inference, default is c('single','full')

Value

A list

toast_res

If pval is NULL, then TOAST result by function csTest() is returned

tree_res

matrix of posterior probability of each feature for each cell type

fig

If corr.fig = TRUE, then figure show DE/DM state correlation between cell types will be returned

time_used

If run.time = TRUE, then running time (seconds) of CeDAR will be returned

Author(s)

Luxiao Chen <luxiao.chen@emory.edu>

Examples

N <- 300 # simulation a dataset with 300 samples
K <- 3 # 3 cell types
P <- 500 # 500 features

### simulate proportion matrix
Prop <- matrix(runif(N*K, 10,60), ncol=K)
Prop <- sweep(Prop, 1, rowSums(Prop), FUN="/")
colnames(Prop) <- c("Neuron", "Astrocyte", "Microglia")

### simulate phenotype names
design <- data.frame(disease=factor(sample(0:1,size = N,replace=TRUE)),
                     age=round(runif(N, 30,50)),
                     race=factor(sample(1:3, size = N,replace=TRUE)))
Y <- matrix(rnorm(N*P, N, P), ncol = N)
rownames(Y) <- paste0('gene',1:nrow(Y))
d1 <-  data.frame('disease' = factor(sample(0:1,size = N,replace=TRUE)))

### Only provide bulk data, proportion
res <- cedar(Y_raw = Y, prop = Prop,
             design.1 = design[,1:2],
             design.2 = design[,3],
             factor.to.test = 'disease',
             cutoff.tree = c('pval',0.1),
             corr.fig = TRUE,
             cutoff.prior.prob = c('pval',0.1) )

### result of toast (independent test)
str(res$toast_res)
### posterior probability of DE/DM of cedar with single layer tree structure
head(res$tree_res$single$pp)
### posterior probability of DE/DM of cedar with muliple layer tree structure
head(res$tree_res$full$pp)
### estimated tree structure of three cell types
head(res$tree_res$full$tree_structure)
### scatter plot of -log10(pval) showing DE/DM state correlation between cell types
res$fig


### Using custom similarity function to estimate tree structure
### In CeDAR, the input is assumed to be a matrix of log transformed p-values 
### with row representing genes and columns represening cell types

sim.fun <- function(log.pval){
  similarity.res <- sqrt((1 - cor(log.pval, method = 'spearman'))/2)
  return(similarity.res)
}

res <- cedar(Y_raw = Y, prop = Prop,
             design.1 = design[,1:2],
             design.2 = design[,3],
             factor.to.test = 'disease',
             cutoff.tree = c('pval',0.1),
             similarity.function = sim.fun,
             corr.fig = FALSE,
             cutoff.prior.prob = c('pval',0.1) )

### posterior probability of DE/DM of cedar with muliple layer tree structure
head(res$tree_res$full$pp) 


### Using custom tree structure as input
### cell type 1 and cell type 3 are more similar
tree.input <- rbind(c(1,1,1),c(1,2,1),c(1,2,3))
### If column name is provided for the matrix; make sure it is same as variable Prop
colnames(tree.input) <- c("Neuron", "Astrocyte", "Microglia")

res <- cedar(Y_raw = Y, prop = Prop,
             design.1 = design[,1:2],
             design.2 = design[,3],
             factor.to.test = 'disease',
             cutoff.tree = c('pval',0.1),
             tree = tree.input,
             corr.fig = FALSE,
             cutoff.prior.prob = c('pval',0.1) )

### posterior probability of DE/DM of cedar with muliple layer tree structure
head(res$tree_res$custom$pp) 



### Using custom tree structure and prior probability of each node as input
### cell type 1 and cell type 3 are more similar
tree.input <- rbind(c(1,1,1),c(1,2,1),c(1,2,3))
colnames(tree.input) <- c("Neuron", "Astrocyte", "Microglia")

p.matrix.input <- rbind(c(0.2,0.2,0.2), c(0.5,0.25,0.5), c(0.5,1,0.5))
# marginally, each cell type has 0.05 (cell 1: 0.2 * 0.5 * 0.5, cell 2: 0.2 * 0.25 * 1)
# probability to be DE for a randomly picked gene
# there will be about 50% DE genes in cell type 1 overlaped with cell type 3; 
# while there will be about 25% DE genes in cell type 1 overlaped with cell type 2

res <- cedar(Y_raw = Y, prop = Prop,
             design.1 = design[,1:2],
             design.2 = design[,3],
             factor.to.test = 'disease',
             cutoff.tree = c('pval',0.1),
             tree = tree.input,
             p.matrix.input = p.matrix.input,
             corr.fig = FALSE,
             cutoff.prior.prob = c('pval',0.1) )

### posterior probability of DE/DM of cedar with muliple layer tree structure
head(res$tree_res$custom$pp) 




ziyili20/TOAST documentation built on Aug. 28, 2022, 11:28 a.m.