EMPCA notes"

Compare nipals/empca with no missing Compare nipals/empca with missing, equal weight Compare nipals/empca with missing, unequal weight

Complete data

# Python - Coeff (scores)
[[-2.809  0.097  0.244  0.050]
 [-1.834  0.286  0.010 -0.135]
 [-0.809  0.963 -0.341  0.078]
 [-0.155 -1.129  0.548  0.026]
 [0.707  -0.723 -0.736 -0.024]
 [1.830  -0.290 -0.157  0.030]
 [3.070   0.796  0.431 -0.026]]

# m1e <- empca(x=B1, w=B1wt, ncomp=4)
# Un-sweep the eigenvalues to compare to python results
# R round( sweep( m1e$scores, 2, m1e$eig, "*"), 3)
      PC1    PC2    PC3    PC4
G1 -2.809  0.097 -0.244  0.050
G2 -1.834  0.286 -0.010 -0.135
G3 -0.809  0.963  0.341  0.078
G4 -0.155 -1.129 -0.548  0.026
G5  0.707 -0.723  0.736 -0.024
G6  1.830 -0.290  0.157  0.030
G7  3.070  0.796 -0.431 -0.026

# Matlab - P (scores)
  0.5590   0.0517   0.2210   0.2910
  0.3650   0.1520   0.0095  -0.7840
  0.1610   0.5120  -0.3080   0.4530
  0.0309  -0.6010   0.4950   0.1510
 -0.1410  -0.3850  -0.6640  -0.1380
 -0.3650  -0.1540  -0.1420   0.1760
-0.6110   0.4230   0.3890  -0.1490

# R round(m1e$scores, 3)
      PC1    PC2    PC3    PC4
G1 -0.559 -0.052  0.221 -0.291
G2 -0.365 -0.152  0.009  0.784
G3 -0.161 -0.512 -0.308 -0.453
G4 -0.031  0.601  0.495 -0.151
G5  0.141  0.385 -0.664  0.138
G6  0.365  0.154 -0.142 -0.176
G7  0.611 -0.423  0.389  0.149

Missing data

# Python with initial Identity matrix

[[2.791 0.125 0.325 -0.035]
 [1.528 -0.989 -0.211 0.172]
 [0.990 -0.651 -0.117 -0.186]
 [0.159 1.463 0.530 0.020]
 [-0.628 0.862 -0.730 -0.032]
 [-1.738 0.406 -0.139 -0.071]
 [-2.917 -0.712 0.520 -0.047]]

Eigvec (loadings)
[[-0.309 -0.839 -0.298 0.300]
 [-0.502 0.014 0.154 -0.615]
 [-0.470 -0.086 0.766 0.219]
 [-0.441 0.521 -0.236 0.615]
 [-0.487 0.128 -0.496 -0.326]]


# R 

R> m2e <- empca(x=B2, w=B2wt, ncomp=4, seed=NULL)
# # Un-sweep the eigenvalues to compare to python results
R> round( sweep( m2e$scores, 2, m2e$eig, "*"), 3)
      PC1    PC2    PC3    PC4
G1 -2.791  0.216 -0.356  0.066
G2 -1.528 -0.942  0.187 -0.150
G3 -0.990 -0.620  0.101  0.207
G4 -0.159  1.472 -0.522 -0.032
G5  0.628  0.844  0.744  0.021
G6  1.738  0.351  0.161  0.050
G7  2.917 -0.808 -0.493  0.019
R> round( m2e$loadings, 3)
     PC1    PC2    PC3    PC4
E1 0.309 -0.839  0.298 -0.300
E2 0.502  0.014 -0.154  0.615
E3 0.470 -0.086 -0.766 -0.219
E4 0.441  0.521  0.236 -0.615
E5 0.487  0.128  0.496  0.326

Python

Python code by @bailey2012principal, retrieved 1 Mar 2019 from https://github.com/sbailey/empca .

The Python code is difficult to read in places for a person [like me] not well-versed with Python. Three examples:

  1. It is not clear what values k takes in for k in range(self.nvec).
  2. Gram-Schmidt orthogonalization is accomplished with a pair of nested for loops instead of a function.
  3. The Model class structure makes it a bit tricky to figure out what objects have actually been modified inside a function.

The Python code iterates these two EM steps:

  1. Calculate the coefficient matrix C.
  2. Calculate ALL ncomp principal components P simultaneously (iterate each to convergence). Orthogonalize P.

For a complete-data problem, python and R give similar results. Note the Coeff matrix in python does NOT have eigenvalues swept out of the columns.

For the missing-data problem, the python results are somewhat different from R.

Matlab

Matlab code by Vicente Parot, retrieved 1 Mar 2019 from https://www.mathworks.com/matlabcentral/fileexchange/45353-empca.

The Matlab code feels similar to R.

The Matlab code calculates principal components sequentially, one at time. For each principal component, the algorithm iterates between these two steps:

  1. Calculate C[,h]
  2. Calculate P[,h]

While this is a type of EM algorithm, it is not the algorithm described by Bailey (2012) and is considered further.



Try the nipals package in your browser

Any scripts or data that you put into this service are public.

nipals documentation built on Jan. 24, 2020, 9:06 a.m.