Description Usage Arguments Details Value Author(s) Examples
Apply the Connectivity Scores to a K clustering result. More information can be found in the Details section below.
1 2 3 |
data |
A gene expression matrix with the compounds in the columns. |
clusterlabels |
A vector of integers that represents the cluster grouping of the columns (compounds) in |
type |
Type of CS anaylsis (default=
In the first two options, either MFA or PCA is used depending on the cluster size. If the query set only contains a single compound, the latter is used. Also note that if a cluster only contains a single compound, no Within-CS can be computed. |
WithinABS |
Boolean value to take the mean of the absolute values in the final step of the Within-Cluster CS (default= |
BetweenABS |
Boolean value to take the mean of the absolute values in the final step of the Between-Cluster CS (default= |
FactorABS |
Boolean value to take the absolute value of the query loadings when determining the best factor (= factor with highest query loadings) in a |
verbose |
Boolean value to output warnings and information about which factor is chosen in a CS analysis (if applicable). |
Within |
A vector for which cluster numbers the Within-Cluster CS should be computed. By default (= |
Between |
A vector fir which cluster numbers the Beween-Cluster CS (with the cluster as a query set) should be computed. By default (= |
WithinSave |
Boolean value to save the |
BetweenSave |
Boolean value to save the |
... |
Additional parameters given to |
After applying cluster analysis on the additional data matrix, K clusters are obtained.
Each cluster will be seen as a potential query set (for CSanalysis
) for which 2 connectivity score metrics can be computed, the Within-Cluster CS and the Between-Cluster CS.
Within-Cluster CS
This metric will answer the question if the kth cluster is connected on a gene expression
level (in addition to the samples being similar based on the other data source). The
Within-Cluster CS for a cluster is computed as following:
Repeatedly for the ith sample in the kth cluster, apply CSMFA with:
Query Set: All cluster samples excluding the ith sample.
Reference: All samples including the ith sample of the kth cluster.
Retrieve the CS of the ith sample in the cluster.
The Within-Cluster CS for cluster k is now defined as the average of all retrieved CS.
The concept of this metric is to investigate the connectivity for each compound with the cluster. The average of the 'leave-one-out' connectivity scores, the Within-Cluster CS, gives an indication of the gene expression connectivity of this cluster. A high Within-Cluster CS implies that the cluster is both similar on the external data source and on the gene expression level. A low score indicates that the cluster does not share a similar latent gene profile structure.
Between-Cluster CS
In this stage of the analysis, we focus on the lth cluster and use all compounds in this
cluster as the query set. A CSMFA is performed in which all other clusters are the
reference set. Next, the connectivity scores are calculated for all reference compounds
and averaged over the clusters (=the between connectivity score).
A high Between-Cluster CS between the lth and jth clusters implies that, while the two
clusters are not similar based on the other data source, they do share a latent structure
when considering the gene expression data.
A list object with components:
CSmatrix
: A K\timesK matrix containing the Within scores on the diagonal and the Between scores elsewhere with the rows being the query set clusters (e.g. m_{13}= Between CS between cluster 1 (as query set) and cluster 3).
CSRankmatrix
: The same as CSmatrix
, but with connectivity ranking scores (if applicable).
clusterlabels
: The provided clusterlabels
Save
: A list with components:
Within
: A list with a component for each cluster k that contains:
LeaveOneOutCS
: Each leave-one-out connectivity score for cluster k.
LeaveOneOutCSRank
: Each leave-one-out connectivity ranking score for cluster k (if applicable).
factorselect
: A vector containing which factors/BCs were selected in each leave-one-out CS analysis (if applicable).
CS
: A (columns (compounds) \times size of cluster k) matrix that contains all the connectivity scores in a leave-one-out CS analysis for each left out compound.
CSRank
: The same as CS
, but with connectivity ranking scores (if applicable).
Between
: List:
DataBetweenCS
: A (columns (compounds) \times clusters) matrix containing all compound connectivity scores for each query cluster set.
DataBetweenCSRank
: The same as DataBetweenCS
, but with connectivity ranking scores (if applicable).
queryindex
: The column indices for each query set in all CS analyses.
factorselect
: A vector containing which factors/BCs were selected in each CS analysis (if applicable).
Ewoud De Troyer
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
# Example Data Set
data("dataSIM",package="CSFA")
# Remove some no-connectivity compounds
nosignal <- sapply(colnames(dataSIM),FUN=function(x){grepl("c-",x)})
data <- dataSIM[,-which(nosignal)[1:250]]
# Toy example with random cluster assignment:
# Note: clusterlabels can be acquired through cutree(hclust(...))
clusterlabels <- sample(1:10,size=ncol(data),replace=TRUE)
result1 <- CScluster(data,clusterlabels,type="CSmfa")
result2 <- CScluster(data,clusterlabels,type="CSzhang")
result1$CSmatrix
result1$CSRankmatrix
result2$CSmatrix
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.