Description Usage Arguments Details Value See Also Examples
This is one of the functions for real data analysis, which will cluster the data based on the environment, as well as ignoring the environment
1 2 3 4 5 6 | r_cluster_data(data, response, exposure, train_index, test_index,
cluster_distance = c("corr", "corr0", "corr1", "tom", "tom0", "tom1",
"diffcorr", "difftom", "fisherScore"), eclust_distance = c("fisherScore",
"corScor", "diffcorr", "difftom"), measure_distance = c("euclidean",
"maximum", "manhattan", "canberra", "binary", "minkowski"),
minimum_cluster_size = 50, ...)
|
data |
n x p matrix of data. rows are samples, columns are genes or cpg sites. Should not contain the environment variable |
response |
numeric vector of length n |
exposure |
binary (0,1) numeric vector of length n for the exposure status of the n samples |
train_index |
numeric vector indcating the indices of |
test_index |
numeric vector indcating the indices of |
cluster_distance |
character representing which matrix from the training set that you want to use to cluster the genes. Must be one of the following
|
eclust_distance |
character representing which matrix from the training
set that you want to use to cluster the genes based on the environment. See
|
measure_distance |
one of "euclidean","maximum","manhattan",
"canberra", "binary","minkowski" to be passed to |
minimum_cluster_size |
The minimum cluster size. Only applicable if
|
... |
arguments passed to the |
This function clusters the data. The results of this function should
then be passed to the r_prepare_data
function which output
the appropriate X and Y matrices in the right format for regression
packages such as mgcv
, caret
and glmnet
a list of length 8:
clustering results
based on the environment and not the environment. see
u_cluster_similarity
for
details
clustering results ignoring the environment. See
u_cluster_similarity
for details
vector of the exposure variable for the training set
the similarity matrix based on the
argument specified in
cluster_distance
the similarity
matrix based on the argument specified in
eclust_distance
a data.frame and data.table of the clustering membership for clustering results based on the environment and not the environment. As a result, each gene will show up twice in this table
a data.frame and data.table of the clustering membership for clustering results based on all subjects i.e. ignoring the environment. Each gene will only show up once in this table
a data.frame and data.table of the clustering membership for clustering results accounting for the environment. Each gene will only show up once in this table
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | data("tcgaov")
tcgaov[1:5,1:6, with = FALSE]
Y <- log(tcgaov[["OS"]])
E <- tcgaov[["E"]]
genes <- as.matrix(tcgaov[,-c("OS","rn","subtype","E","status"),with = FALSE])
trainIndex <- drop(caret::createDataPartition(Y, p = 0.5, list = FALSE, times = 1))
testIndex <- setdiff(seq_len(length(Y)),trainIndex)
## Not run:
cluster_res <- r_cluster_data(data = genes,
response = Y,
exposure = E,
train_index = trainIndex,
test_index = testIndex,
cluster_distance = "tom",
eclust_distance = "difftom",
measure_distance = "euclidean",
clustMethod = "hclust",
cutMethod = "dynamic",
method = "average",
nPC = 1,
minimum_cluster_size = 60)
# the number of clusters determined by the similarity matrices specified
# in the cluster_distance and eclust_distance arguments. This will always be larger
# than cluster_res$clustersAll$nclusters which is based on the similarity matrix
# specified in the cluster_distance argument
cluster_res$clustersAddon$nclusters
# the number of clusters determined by the similarity matrices specified
# in the cluster_distance argument only
cluster_res$clustersAll$nclusters
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.