1. Independent Component Analysis (ICA)

Introduction

In this vignette, we consider approximating a data matrix as a product of multiple low-rank matrices (a.k.a., factor matrices).

Test data is available from toyModel.

library("iTensor")
data1 <- iTensor::toyModel("ICA_Type1")
data2 <- iTensor::toyModel("ICA_Type2")
data3 <- iTensor::toyModel("ICA_Type3")
data4 <- iTensor::toyModel("ICA_Type4")
data5 <- iTensor::toyModel("ICA_Type5")
dim(data1$X_observed)
dim(data2$X_observed)
dim(data3$X_observed)
dim(data4$X_observed)
dim(data5$gene) # N < P systems

Summary of these data is as follows:

  1. ICA with time-independent sub-Gaussian data
  2. ICA with time-independent super-Gaussian data
  3. ICA with data mixed with signals having no time dependence and different kurtosis
  4. ICA with time-dependent data
  5. IPCA in N < P systems

You will see that the stuctures of these data as follows.

pairs(data1$X_observed, main="data1")
pairs(data2$X_observed, main="data2")
pairs(data3$X_observed, main="data3")
pairs(data4$X_observed, main="data4")
dim(data5$gene)

To perform and visualize all the ICA at once, here we write some functions as follows.

allICA <- function(X, J){
    # Classic ICA Methods
    out.FastICA <- ICA(X=X, J=J, algorithm="FastICA")
    out.InfoMax <- ICA(X=X, J=J, algorithm="InfoMax")
    out.ExtInfoMax <- ICA(X=X, J=J, algorithm="ExtInfoMax")
    # Modern ICA Method
    out.JADE <- ICA2(X=X, J=J, algorithm="JADE")
    # Output
    list(out.FastICA=out.FastICA, out.InfoMax=out.InfoMax,
      out.ExtInfoMax=out.ExtInfoMax, out.JADE=out.JADE)
}

CorrIndexAllICA <- function(S, out){
    tmp <- lapply(out, function(x, S){CorrIndex(cor(S, Re(x$S)))}, S=S)
    Name <- gsub("out.", "", names(tmp))
    Value <- round(unlist(tmp), 3)
    names(Value) <- NULL
    cbind(Name, Value)
    tmp <- as.data.frame(cbind(Name, Value))
    colnames(tmp) <- c("Method", "CorrIndex")
    knitr::kable(tmp)
}

Note that we select only four ICA algorithms for the demonstration but in iTensor 12 ICA algorithms are available. For the details, check ?ICA and ?ICA2.

1. ICA with time-independent sub-Gaussian data

In this data, according to the independence between estimated source signal, an upright square means that the calculation is successful and other cases such as a rhombus rotated 45 degrees from a square means that the calculation is fail.

set.seed(123456)
out1 <- allICA(X=data1$X_observed, J=3)

Here we use CorrIndex, which is a performance index to evaluate ICA results [@corrindex]. CorrIndex imply that FastICA, ExtInfoMax, and JADE performed well (the closer to 0, the better the performance).

CorrIndexAllICA(data1$S_true, out1)

We can see the details of source signals as follows.

pairs(out1$out.FastICA$S, main="FastICA")
pairs(out1$out.InfoMax$S, main="InfoMax")
pairs(out1$out.ExtInfoMax$S, main="ExtInfoMax")
pairs(Re(out1$out.JADE$S), main="JADE")

2. ICA with time-independent super-Gaussian data

In this data, if the outliers are distributed along the lines of x = 0 and y = 0, the calculation is successful and if they are of the cross-shape, it is a failure.

set.seed(123456)
out2 <- allICA(X=data2$X_observed, J=3)

CorrIndex imply that all the algorithms performed well.

CorrIndexAllICA(data2$S_true, out2)

We can see the details of source signals as follows.

pairs(out2$out.FastICA$S, main="FastICA")
pairs(out2$out.InfoMax$S, main="InfoMax")
pairs(out2$out.ExtInfoMax$S, main="ExtInfoMax")
pairs(Re(out2$out.JADE$S), main="JADE")

3. ICA with data mixed with signals having no time dependence and different kurtosis

In this data, the uniform distribution should be extracted in such a way that it is not rhombic, and furthermore the normal distribution and the t-distribution with 1 degree of freedom should be extracted independently. Sometimes, it is difficult ot determine visually though.

set.seed(123456)
out3 <- allICA(X=data3$X_observed, J=3)

CorrIndex imply that FastICA, ExtInfoMax, and JADE performed well.

CorrIndexAllICA(data3$S_true, out3)

We can see the details of source signals as follows.

pairs(out3$out.FastICA$S, main="FastICA")
pairs(out3$out.InfoMax$S, main="InfoMax")
pairs(out3$out.ExtInfoMax$S, main="ExtInfoMax")
pairs(Re(out3$out.JADE$S), main="JADE")

4. ICA with time-dependent data

In this data, if the mesh pattern is reproduced, the calculation is successful.

set.seed(123456)
out4 <- allICA(X=data4$X_observed, J=3)

CorrIndex imply that FastICA, ExtInfoMax, and JADE performed well.

CorrIndexAllICA(data4$S_true, out4)

We can see the details of source signals as follows.

pairs(out4$out.FastICA$S, main="FastICA")
pairs(out4$out.InfoMax$S, main="InfoMax")
pairs(out4$out.ExtInfoMax$S, main="ExtInfoMax")
pairs(Re(out4$out.JADE$S), main="JADE")

5. IPCA in N < P systems

This kind of data is an thin and long matrix, which is a very common data structure in omics data. In iTensor, we re-implemented the IPCA function of mixOmics package.

library("mixOmics")
set.seed(123456)

# mixOmics's IPCA
res.ipca <- ipca(data5$gene, ncomp=3, mode="deflation")

# iTensor's IPCA
out5 <- ICA2(X=as.matrix(data5$gene), J=3, algorithm="IPCA")

We can see the details of source signals as follows. In this data, we can confirm that the IPCA of mixOmics and iTensor have the same results, except for the order and constant-fold relationships.

pairs(res.ipca$x, main="IPCA (mixOmics)", col=data5$treatment[,"Treatment.Group"], pch=16)
pairs(out5$S, main="IPCA (iTensor)", col=data5$treatment[,"Treatment.Group"], pch=16)

Session Information {.unnumbered}

sessionInfo()

References



Try the iTensor package in your browser

Any scripts or data that you put into this service are public.

iTensor documentation built on April 28, 2023, 9:11 a.m.