pwncut: Cluster the Columns of X into K Clusters by Giving a Weighted...

Description Usage Arguments Details References Examples

Description

This function will output K channels of variables.

Usage

1
2
3
pwncut(X, K = 2, B = 3000, L = 1000, scale = TRUE, lambda = 1,
  epsilon = 0, nstarts = 3, start = "default", dist = "gaussian",
  sigma = 0.1, beta = 1)

Arguments

X

is a n x p matrix of p variables and n observations.

K

is the number of clusters.

B

is the number of iterations in the simulated annealing algorithm.

L

is the temperature coefficient in the simulated annealing algorithm.

scale

equals TRUE if data X is to be scaled with mean 0 and variance 1.

lambda

the tuning parameter of the penalty. Larger values shrink the weighted cluster membership closer together (default = 1).

epsilon

values in the similarity matrix less than epsilon are set to 0 (default = 0).

nstarts

the number of starting values also corresponding how many times simulated annealing is run. Larger values provide better results but takes longer.

start

if it equals 'default' then the starting value for all weights is 1/K. If 'random' then weights are sampled from a uniform distribution and then scaled to sum 1 per variable.

dist

specifies the distance metric used for constructing the similarity matrix. Options are 'gaussian', 'correlation' and 'euclidean' (default = 'gaussian').

sigma

is the bandwidth parameter when the dist metric chosen is 'gaussian' (default = 0.1).

beta

when dist='correlation', beta is the exponent applied to each entry of the similarity matrix.

Details

The algorithm minimizes a modified version of NCut through simulated annealing. The clusters correspond to partitions that minimize this objective function.

References

Sebastian J. Teran Hidalgo, Mengyun Wu and Shuangge Ma. Penalized and weighted clustering of gene expression data using PWNCut. (Submitted.)

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
# This sets up the initial parameters for the simulation.
n <- 100 # Sample size
p <- 100 # Number of columns of Y.
K <- 3

C0            <- matrix(0,p,K)
C0[1:25,1]    <- matrix(1,25,1)
C0[26:75,1:3] <- matrix(1/3,50,3)
C0[76:100,3]  <- matrix(1,25,1)

A0 <- C0[ ,1]%*%t(C0[ ,1]) + C0[ ,2]%*%t(C0[ ,2]) +
      C0[ ,3]%*%t(C0[ ,3])
A0 <- A0 - diag(diag(A0)) + diag(p)

Z1 <- rnorm(n,0,2)
Z2 <- rnorm(n,0,2)
Z3 <- rnorm(n,0,2)

Y <- matrix(0,n,p)
Y[ ,1:25]   <-  matrix(rnorm(n*25, 0, 2), n, 25) + matrix(Z1, n, 25, byrow=FALSE)
Y[ ,26:75]  <-  matrix(rnorm(n*50, 0, 2), n, 50) + matrix(Z1, n, 50, byrow=FALSE) +
                matrix(Z2, n, 50, byrow=FALSE) + matrix(Z3, n, 50, byrow=FALSE)
Y[ ,76:100] <-  matrix(rnorm(n*25, 0, 2), n, 25) + matrix(Z3, n, 25, byrow=FALSE)

trial <- pwncut(Y,
                K       = 3,
                B       = 10000,
                L       = 1000,
                lambda  = 1.5,
                start   = 'default',
                scale   = TRUE,
                nstarts = 1,
                epsilon = 0,
                dist    = 'correlation',
                sigma   = 10)

A1 <- trial[[2]][ ,1]%*%t(trial[[2]][ ,1]) +
      trial[[2]][ ,2]%*%t(trial[[2]][ ,2]) +
      trial[[2]][ ,3]%*%t(trial[[2]][ ,3])

A1 <- A1 - diag(diag(A1)) + diag(p)

plot(trial[[1]], type='l')
errorL <- sum(abs(A0-A1))/p^2
errorL

Seborinos/NCutYX documentation built on May 9, 2019, 1:20 p.m.