muncut: MuNCut Clusters the Columns of Data from 3 Different Sources.
In NCutYX: Clustering of Omics Data of Multiple Types with a Multilayer Network Representation

Description Usage Arguments Details References Examples

It clusters the columns of Z,Y and X into K clusters by representing each data type as one network layer. It represents the Z layer depending on Y, and the Y layer depending on X. Elastic net can be used before the clustering procedure by using the predictions of Z and Y instead of the actual values to improve the cluster results. This function will output K clusters of columns of Z, Y and X.

1
2
3

muncut(Z, Y, X, K = 2, B = 3000, L = 1000, alpha = 0.5, ncv = 3,
  nlambdas = 100, scale = FALSE, model = FALSE, gamma = 0.5,
  sampling = "equal", dist = "gaussian", sigma = 0.1)

`Z`	is a n x q matrix of q variables and n observations.
`Y`	is a n x p matrix of p variables and n observations.
`X`	is a n x r matrix of r variables and n observations.
`K`	is the number of column clusters.
`B`	is the number of iterations in the simulated annealing algorithm.
`L`	is the temperature coefficient in the simulated annealing algorithm.
`alpha`	is the tuning parameter in the elastic net penalty, only used when model=T.
`ncv`	is the number of cross-validations used to choose the tuning parameter lambda in the elastic net penalty, only used when model=T.
`nlambdas`	number of tuning parameters lambda used during cross-validation, only when model=T.
`scale`	when TRUE the Z, Y and X are scaled with mean 0 and standard deviation equal 1.
`model`	when TRUE the the relationship between Z and Y, and between Y and X are modeled with the elastic net. The predictions of Z, and Y from the models are used in the clustering algorithm.
`gamma`	is the tuning parameter of the clustering penalty. Larger values give more importance to within layer effects and less to across layer effects.
`sampling`	if 'equal' then the sampling distribution is discrete uniform over the number of clusters, if 'size' the probabilities are inversely proportional to the size of each cluster.
`dist`	is the type of distance measure use in the similarity matrix. Options are 'gaussian' and 'correlation', with 'gaussian' being the default.
`sigma`	is the bandwidth parameter when the dist metric chosen is gaussian.

The algorithm minimizes a modified version of NCut through simulated annealing. The clusters correspond to partitions that minimize this objective function. The external information of X is incorporated by using ridge regression to predict Y.

Sebastian J. Teran Hidalgo and Shuangge Ma. Clustering Multilayer Omics Data using MuNCut. (Revise and resubmit.)

library(NCutYX)
library(MASS)
library(fields) #for image.plot

#parameters#
set.seed(777)
n=50
p=50
h=0.5
rho=0.5

W0=matrix(1,p,p)
W0[1:(p/5),1:(p/5)]=0
W0[(p/5+1):(3*p/5),(p/5+1):(3*p/5)]=0
W0[(3*p/5+1):(4*p/5),(3*p/5+1):(4*p/5)]=0
W0[(4*p/5+1):p,(4*p/5+1):p]=0
W0=cbind(W0,W0,W0)
W0=rbind(W0,W0,W0)

Y=matrix(0,n,p)
Z=matrix(0,n,p)
Sigma=matrix(rho,p,p)
Sigma[1:(p/5),1:(p/5)]=2*rho
Sigma[(p/5+1):(3*p/5),(p/5+1):(3*p/5)]=2*rho
Sigma[(3*p/5+1):(4*p/5),(3*p/5+1):(4*p/5)]=2*rho
Sigma=Sigma-diag(diag(Sigma))
Sigma=Sigma+diag(p)

X=mvrnorm(n,rep(0,p),Sigma)
B1=matrix(0,p,p)
B2=matrix(0,p,p)

B1[1:(p/5),1:(p/5)]=runif((p/5)^2,h/2,h)*rbinom((p/5)^2,1,0.2)
B1[(p/5+1):(3*p/5),(p/5+1):(3*p/5)]=runif((2*p/5)^2,h/2,h)*rbinom((2*p/5)^2,1,0.2)
B1[(3*p/5+1):(4*p/5),(3*p/5+1):(4*p/5)]=runif((p/5)^2,h/2,h)*rbinom((p/5)^2,1,0.2)

B2[1:(p/5),1:(p/5)]=runif((p/5)^2,h/2,h)*rbinom((p/5)^2,1,0.2)
B2[(p/5+1):(3*p/5),(p/5+1):(3*p/5)]=runif((2*p/5)^2,h/2,h)*rbinom((2*p/5)^2,1,0.2)
B2[(3*p/5+1):(4*p/5),(3*p/5+1):(4*p/5)]=runif((p/5)^2,h/2,h)*rbinom((p/5)^2,1,0.2)

Y=X%*%B1+matrix(rnorm(n*p,0,0.5),n,p)
Y2=X%*%B1

Z=Y%*%B2+matrix(rnorm(n*p,0,0.5),n,p)
Z2=Y%*%B2

#Computing our method
clust <- muncut(Z,
                Y,
                X,
                K        = 4,
                B        = 10000,
                L        = 500,
                sampling = 'size',
                alpha    = 0.5,
                ncv      = 3,
                nlambdas = 20,
                sigma    = 10,
                scale    = TRUE,
                model    = FALSE,
                gamma    = 0.1)

A <- clust[[2]][,1]%*%t(clust[[2]][,1])+
     clust[[2]][,2]%*%t(clust[[2]][,2])+
     clust[[2]][,3]%*%t(clust[[2]][,3])+
     clust[[2]][,4]%*%t(clust[[2]][,4])

errorK=sum(A*W0)/(3*p)^2
errorK
plot(clust[[1]],type='l')
image.plot(A)

NCutYX documentation built on May 2, 2019, 3:15 a.m.

NCutYX index

README.md The NCutYX package

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

NCutYX
Clustering of Omics Data of Multiple Types with a Multilayer Network Representation

muncut: MuNCut Clusters the Columns of Data from 3 Different Sources.
In NCutYX: Clustering of Omics Data of Multiple Types with a Multilayer Network Representation

Description

Usage

Arguments

Details

References

Examples

Related to muncut in NCutYX...

R Package Documentation

Browse R Packages

We want your feedback!

NCutYX Clustering of Omics Data of Multiple Types with a Multilayer Network Representation

muncut: MuNCut Clusters the Columns of Data from 3 Different Sources. In NCutYX: Clustering of Omics Data of Multiple Types with a Multilayer Network Representation

Description

Usage

Arguments

Details

References

Examples

Related to muncut in NCutYX...

R Package Documentation

Browse R Packages

We want your feedback!

NCutYX
Clustering of Omics Data of Multiple Types with a Multilayer Network Representation

muncut: MuNCut Clusters the Columns of Data from 3 Different Sources.
In NCutYX: Clustering of Omics Data of Multiple Types with a Multilayer Network Representation