Description Usage Arguments Value See Also Examples
Implementation of the EMVC algorithm. Takes an n-by-p data matrix and a c-by-p binary annotation matrix and generates an optimized, i.e., filtered, version of the annotation matrix by minimizing the entropy between each variable group and the categorical random variable representing membership of each variable in clusters output by either k-means clustering or horizontal cuts of a dendrogram generated via agglomerative hierarchical clustering with correlation distance. Annotations are never added during optimization, just removed.
1 2 3 |
data |
Input data matrix, observations-by-variables. Must be specified. Cannot contain missing values. |
annotations |
Binary annotation matrix, variable groups-by-variables. Must be specified. |
bootstrap.iter |
Number of bootstrap iterations. Defaults to 20. If set to 1, will return the results from a single optimization run on the input data matrix (i.e., no bootstrapping will be performed). |
clust.method |
Method used to generate variable clusters. Either "kmeans" or "hclust". Defaults to "kmeans". |
k.range |
Range of k-means k values or dendrogram cut sizes. Must be specified. |
kmeans.nstart |
Only relevant if clust.method is "kmeans". K-means nstart value. Defaults to 5. |
kmeans.iter.max |
Only relevant if clust.method is "kmeans".Max number of iterations for k-means. Defaults to 20. |
hclust.method |
Only relevant if clust.method is "hclust". Will be supplied as the "method" argument to the R function |
hclust.cor.method |
Only relevant if clust.method is "hclust".
Will be supplied as the "method" argument to the R |
Optimized version of the annotation matrix. Contains the average proportion of cluster sizes in which a given annotation was kept during optimization. If bootstrapping is enabled, the optimized matrix will contain the average proportions over all bootstrap resampled datasets.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | ## Create random sparse annotation matrix for 50 variable groups
## and 100 variables
annotations = matrix(rbinom(5000,1,.1), nrow=50, ncol=100)
## Number of initial annotations
sum(annotations)
## Create random gene expression matrix for 50 observations and 100 variables
data = matrix(rnorm(5000), nrow=50, ncol=100)
## Execute EMVC using k-means
EMVC.results = EMVC(data=data, annotations=annotations,
bootstrap.iter=30, k.range=2:10, clust.method="kmeans",
kmeans.nstart=3, kmeans.iter.max=10)
## Filter the results at .9 threshold
filtered.opt.annotations = filterAnnotations(EMVC.results, .9)
## Number of optimized annotations at .9 threshold, should be close to 0 since the
## variable groups and data are random (i.e., no random annotations avoid
## optimization-based filtering most of the time)
sum(filtered.opt.annotations)
## Filter the results at .1 threshold
filtered.opt.annotations = filterAnnotations(EMVC.results, .1)
## Number of optimized annotations at .1 threshold, should be close to
## the initial number of annotations since the variable groups and data are random
## (i.e., no random variables are consistently filtered by the EMVC algorithm)
sum(filtered.opt.annotations)
|
[1] 464
Bootstrap iteration 10: Sampling 50 values with replacement. Optimizing 464 true annotations out of 5000
Finished optimization: 180.444444444444 annotations out of 5000
Bootstrap iteration 20: Sampling 50 values with replacement. Optimizing 464 true annotations out of 5000
Finished optimization: 182 annotations out of 5000
Bootstrap iteration 30: Sampling 50 values with replacement. Optimizing 464 true annotations out of 5000
Finished optimization: 175.666666666667 annotations out of 5000
[1] 0
[1] 464
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.