sparsedc_cluster: Sparse Differential Clustering

Description Usage Arguments Value See Also Examples

Description

The main SparseDC function. This function clusters the samples from the two conditions and links the clusters across the conditions. It also identifies marker genes for each of the clusters. There are three types of marker gene which SparseDC identifies. Please see the original manuscript for further details.

Usage

1
2
sparsedc_cluster(pdat1, pdat2, ncluster, lambda1, lambda2, nitter = 20,
  nstarts = 50, init_iter = 5)

Arguments

pdat1

The centered data from condition 1, columns should be samples (cells) and rows should be features (genes).

pdat2

The centered data from condition 2, columns should be samples (cells) and rows should be features (genes). The number of genes should be the same as pdat1. as in pdat1.

ncluster

The number of clusters present in the data.

lambda1

The lambda 1 value to use in the SparseDC function. This value controls the number of marker genes detected for each of the clusters in the final result. This can be calculated using the "lambda1_calculator" function or supplied by the user.

lambda2

The lambda 2 value to use in the SparseDC function. This value controls the number of genes that show condition-dependent expression within each cell type. This can be calculated using the "lambda2_calculator" function or supplied by the user.

nitter

The max number of iterations for each of the start values, the default value is 20.

nstarts

The number of start values to use for SparseDC. The default value is 50.

init_iter

The number of iterations used to generate the starting center values.

Value

A list containing the clustering solution, cluster centers and the score of each of the starts.

See Also

lambda1_calculator lambda2_calculator update_c update_mu

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
set.seed(10)
# Select small dataset for example
data_test <- data_biase[1:100,]
# Split data into conditions 1 and 2
data_1 <- data_test[ , which(condition_biase == "A")]
data_2 <- data_test[ , which(condition_biase == "B")]
# Preprocess data (log transform and center)
pre_data <- pre_proc_data(data_1, data_2, norm = FALSE, log = TRUE,
center = TRUE)
# Calculate lambda 1 parameter
lambda1 <- lambda1_calculator(pdat1 = pre_data[[1]], pdat2 = pre_data[[2]],
 ncluster=3, alpha1 = 0.5, nboot1 = 1000)
# Calculate lambda 2 parameter
lambda2 <- lambda2_calculator(pdat1 = pre_data[[1]], pdat2 = pre_data[[2]],
 ncluster = 3, alpha2 = 0.5, nboot2 = 1000)
# Run sparse DC
sdc_res <- sparsedc_cluster(pdat1 = pre_data[[1]], pdat2 = pre_data[[2]], ncluster = 3,
lambda1 = lambda1, lambda2 = lambda2, nitter = 20, nstarts =50)
# Extract results
clusters_1 <- sdc_res$clusters1  # Clusters for condition 1 data
clusters_2 <- sdc_res$clusters2  # Clusters for condition 2 data
centers_1 <- sdc_res$centers1  # Centers for condition 1 data
centers_2 <- sdc_res$centers2  # Centers for condition 2 data
# View clusters
summary(as.factor(clusters_1))
summary(as.factor(clusters_2))
# View Marker genes
gene_names <- row.names(data_test)
m_gene_c1_up1 <- gene_names[which(centers_1[,1] > 0)]
m_gene_c1_up2 <- gene_names[which(centers_2[,1] > 0)]
m_gene_c1_down1 <- gene_names[which(centers_1[,1] < 0)]
m_gene_c1_down2 <- gene_names[which(centers_2[,1] < 0)]
m_gene_c2_cond <- gene_names[which(centers_1[,2] != centers_2[,2])]

# Can also run

pre_data <- pre_proc_data(data_1, data_2, norm = FALSE, log = TRUE,
center = TRUE)
pdata_A <- pre_data[[1]]
pdata_B <- pre_data[[2]]
lambda1 <- lambda1_calculator(pdat1 = pdata_A , pdat2 = pdata_B,
ncluster=3, alpha1 = 0.5, nboot1 = 1000)
lambda2 <- lambda2_calculator(pdat1 = pdata_A, pdat2 = pdata_B,
 ncluster = 3, alpha2 = 0.5, nboot2 = 1000)
# Run sparse DC
sdc_res <- sparsedc_cluster(pdat1 = pdata_A, pdat2 = pdata_B, ncluster = 3,
lambda1 = lambda1, lambda2 = lambda2, nitter = 20, nstarts =50)

SparseDC documentation built on May 2, 2019, 9:29 a.m.