Description Usage Arguments Details Value Author(s) References Examples
This function will output K clusters of the columns of Y using the help of X.
1 2 |
Y |
is a n x p matrix of p variables and n observations. The columns of Y will be clustered into K groups. |
X |
is a n x q matrix of q variables and n observations. |
K |
is the number of clusters. |
B |
is the number of iterations in the simulated annealing algorithm. |
L |
is the temperature coefficient in the simulated annealing algorithm. |
alpha |
is the coefficient of the elastic net penalty. |
nlambdas |
is the number of tuning parameters in the elastic net. |
sampling |
if 'equal' then the sampling probabilities is the same during the simulated annealing algorithm, if 'size' the probabilites are proportional the the sizes of the clusters in the current iterations. |
ncv |
is the number of cross-validations in the elastic net. |
dist |
is the type of distance metric for the construction of the similarity matrix. Options are 'gaussian', 'euclidean' and 'correlation', the latter being the default. |
sigma |
is the parameter for the gaussian kernel distance which is ignored if 'gaussian' is not chosen as distance measure. |
The algorithm minimizes a modified version of NCut through simulated annealing.
The modified NCut uses in the numerator the similarity matrix of the original data Y
and the denominator uses the similarity matrix of the prediction of Y
using X
.
The clusters correspond to partitions that minimize this objective function.
The external information of X
is incorporated by using elastic net to predict Y
.
A list with the final value of the objective function, the clusters and the lambda penalty chosen through cross-validation.
A list with the following components:
a vector of length N
which contains the loss
at each iteration of the simulated annealing algorithm.
a matrix representing the clustering result of dimension p
times
K
, where p
is the number of columns of Y
.
is the optimal lambda chosen through cross-validation for the elastic net for
predicting Y
with Y
.
Sebastian Jose Teran Hidalgo and Shuangge Ma. Maintainer: Sebastian Jose Teran Hidalgo. sebastianteranhidalgo@gmail.com.
Hidalgo, Sebastian J. Teran, Mengyun Wu, and Shuangge Ma. Assisted clustering of gene expression data using ANCut. BMC genomics 18.1 (2017): 623.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | #This sets up the initial parameters for the simulation.
library(MASS)#for mvrnorm
library(fields)
n=30 #Sample size
B=50 #Number of iterations in the simulated annealing algorithm.
L=10000 #Temperature coefficient.
p=50 #Number of columns of Y.
q=p #Number of columns of X.
h1=0.15
h2=0.25
S=matrix(0.2,q,q)
S[1:(q/2),(q/2+1):q]=0
S[(q/2+1):q,1:(q/2)]=0
S=S-diag(diag(S))+diag(q)
mu=rep(0,q)
W0=matrix(1,p,p)
W0[1:(p/2),1:(p/2)]=0
W0[(p/2+1):p,(p/2+1):p]=0
Denum=sum(W0)
B2=matrix(0,q,p)
for (i in 1:(p/2)){
B2[1:(q/2),i]=runif(q/2,h1,h2)
in1=sample.int(q/2,6)
B2[-in1,i]=0
}
for (i in (p/2+1):p){
B2[(q/2+1):q,i]=runif(q/2,h1,h2)
in2=sample(seq(q/2+1,q),6)
B2[-in2,i]=0
}
X=mvrnorm(n, mu, S)
Z=X%*%B2
Y=Z+matrix(rnorm(n*p,0,1),n,p)
#Our method
Res=ancut(Y=Y,X=X,B=B,L=L,alpha=0,ncv=3)
Cx=Res[[2]]
f11=matrix(Cx[,1],p,1)
f12=matrix(Cx[,2],p,1)
errorL=sum((f11%*%t(f11))*W0)/Denum+sum((f12%*%t(f12))*W0)/Denum
#This is the true error of the clustering solution.
errorL
par(mfrow=c(1,2))
#Below is a plot of the simulated annealing path.
plot(Res[[1]],type='l',ylab='')
#Cluster found by ANCut
image.plot(Cx)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.