Description Usage Arguments Value Examples
create a function that takes as input, the number of genes, the true beta vector, the gene expression matrix created from the generate_blocks function and returns a list of data matrix, as well as correlation matrices, TOM matrices, cluster information, training and test data
1 2 3 4 5 6 7 8 9 10 11 | s_generate_data_mars(p, X, beta, binary_outcome = FALSE, truemodule, nActive,
cluster_distance = c("corr", "corr0", "corr1", "tom", "tom0", "tom1",
"diffcorr", "difftom", "corScor", "tomScor", "fisherScore"), n, n0,
include_interaction = F, signal_to_noise_ratio = 1,
eclust_distance = c("fisherScore", "corScor", "diffcorr", "difftom"),
cluster_method = c("hclust", "protoclust"), cut_method = c("dynamic",
"gap", "fixed"), distance_method = c("euclidean", "maximum", "manhattan",
"canberra", "binary", "minkowski"), n_clusters,
agglomeration_method = c("complete", "average", "ward.D2", "single",
"ward.D", "mcquitty", "median", "centroid"), nPC = 1, K.max = 10,
B = 10)
|
p |
number of genes in design matrix |
X |
gene expression matrix of size n x p using the
|
beta |
true beta coefficient vector |
binary_outcome |
Logical. Should a binary outcome be generated. Default
is |
truemodule |
numeric vector of the true module membership used in the
|
nActive |
number of active genes in the response used in the
|
cluster_distance |
character representing which matrix from the training set that you want to use to cluster the genes. Must be one of the following
|
n |
total number of subjects |
n0 |
total number of subjects with E=0 |
include_interaction |
Should an interaction with the environment be generated as part of the response. Default is FALSE. |
signal_to_noise_ratio |
signal to noise ratio, default is 1 |
eclust_distance |
character representing which matrix from the training
set that you want to use to cluster the genes based on the environment. See
|
cluster_method |
Cluster the data using hierarchical clustering or
prototype clustering. Defaults |
cut_method |
what method to use to cut the dendrogram. |
distance_method |
one of "euclidean","maximum","manhattan", "canberra",
"binary","minkowski" to be passed to |
n_clusters |
Number of clusters specified by the user. Only applicable
when |
agglomeration_method |
the agglomeration method to be used. This should be (an unambiguous abbreviation of) one of "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). |
nPC |
number of principal components. Can be 1 or 2. |
K.max |
the maximum number of clusters to consider, must be at least
two. Only used if |
B |
integer, number of Monte Carlo (“bootstrap”) samples. Only used if
|
list of (in the following order)
a 1 column matrix containing the true beta coefficient vector
an object of class similarity which is the similarity
matrix specified by the cluster_distance
argument
an object of class similarity which is the
similarity matrix specified by the eclust_distance
argument
data.table of simulated data from the s_response
function
The simulated response
the n0 x p design matrix for the unexposed subjects
the n1 x p design matrix for the exposed subjects
the training design matrix for all subjects
the test set design matrix for all subjects
the training set response
the test set response
the training response and training design matrix in a single data.frame object
the test response and training design matrix in a single data.frame object
a character vector of the active genes i.e. the ones that are associated with the response
the number of clusters identified by using
the similarity matrix specified by the cluster_distance
argument
the number of clusters identified by using the
similarity matrix specified by the eclust_distance
argument
the sum of n_clusters_All
and
n_clusters_Eclust
the cluster membership of each
gene based on the cluster_distance
matrix
the
cluster membership of each gene based on both the cluster_distance
matrix and the eclust_distance
matrix. Note that each gene will
appear twice here
the cluster membership of each gene
based on the eclust_distance
matrix
cluster membership of each gene with a penalty factor used for the group lasso
cluster membership of each gene with a penalty factor used for the group lasso
the TOM matrix based on all training subjects
the absolute difference of the exposed and unexposed TOM matrices: |TOM_{E=1} - TOM_{E=0}|
the TOM matrix based on training exposed subjects only
the TOM matrix based on training unexposed subjects only
the Pearson correlation matrix based on all training subjects
the absolute difference of the exposed and unexposed Pearson correlation matrices: |Cor_{E=1} - Cor_{E=0}|
the Pearson correlation matrix based on training exposed subjects only
the Pearson correlation matrix based on training unexposed subjects only
The fisher
scoring matrix. see u_fisherZ
for details
The correlation scoring matrix: |Cor_{E=1} + Cor_{E=0} - 2|
The MSE for the null model
The 10 training folds used for the stability measures
The 10 X training folds (the same as in DT_train_folds)
The 10 Y training folds (the same as in DT_train_folds)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 | library(magrittr)
# simulation parameters
rho = 0.90; p = 500 ;SNR = 1 ; n = 200; n0 = n1 = 100 ; nActive = p*0.10 ; cluster_distance = "tom";
Ecluster_distance = "difftom"; rhoOther = 0.6; betaMean = 2;
alphaMean = 1; betaE = 3; distanceMethod = "euclidean"; clustMethod = "hclust";
cutMethod = "dynamic"; agglomerationMethod = "average"
#in this simulation its blocks 3 and 4 that are important
#leaveOut: optional specification of modules that should be left out
#of the simulation, that is their genes will be simulated as unrelated
#("grey"). This can be useful when simulating several sets, in some which a module
#is present while in others it is absent.
d0 <- s_modules(n = n0, p = p, rho = 0, exposed = FALSE,
modProportions = c(0.15,0.15,0.15,0.15,0.15,0.25),
minCor = 0.01,
maxCor = 1,
corPower = 1,
propNegativeCor = 0.3,
backgroundNoise = 0.5,
signed = FALSE,
leaveOut = 1:4)
d1 <- s_modules(n = n1, p = p, rho = rho, exposed = TRUE,
modProportions = c(0.15,0.15,0.15,0.15,0.15,0.25),
minCor = 0.4,
maxCor = 1,
corPower = 0.3,
propNegativeCor = 0.3,
backgroundNoise = 0.5,
signed = FALSE)
truemodule1 <- d1$setLabels
X <- rbind(d0$datExpr, d1$datExpr) %>%
magrittr::set_colnames(paste0("Gene", 1:p)) %>%
magrittr::set_rownames(paste0("Subject",1:n))
betaMainEffect <- vector("double", length = p)
# the first nActive/2 in the 3rd block are active
betaMainEffect[which(truemodule1 %in% 3)[1:(nActive/2)]] <- runif(
nActive/2, betaMean - 0.1, betaMean + 0.1)
# the first nActive/2 in the 4th block are active
betaMainEffect[which(truemodule1 %in% 4)[1:(nActive/2)]] <- runif(
nActive/2, betaMean+2 - 0.1, betaMean+2 + 0.1)
beta <- c(betaMainEffect, betaE)
result <- s_generate_data_mars(p = p, X = X,
beta = beta,
binary_outcome = FALSE,
truemodule = truemodule1,
nActive = nActive,
include_interaction = FALSE,
cluster_distance = cluster_distance,
n = n, n0 = n0,
eclust_distance = Ecluster_distance,
signal_to_noise_ratio = SNR,
distance_method = distanceMethod,
cluster_method = clustMethod,
cut_method = cutMethod,
agglomeration_method = agglomerationMethod,
nPC = 1)
names(result)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.