One example for flash and how to use greedy and backfitting. For the speed, I just use backfitting with few runs of flash in each iteration. I also compare with PMD, these are all better than PMD.
library("flash") sim_K = function(K, N, P, SF, SL, signal,noise){ E = matrix(rnorm(N*P,0,noise),nrow=N) Y = E L_true = array(0, dim = c(N,K)) F_true = array(0, dim = c(P,K)) for(k in 1:K){ lstart = rnorm(N, 0, signal) fstart = rnorm(P, 0, signal) index = sample(seq(1:N),(N*SL)) lstart[index] = 0 index = sample(seq(1:P),(P*SF)) fstart[index] = 0 L_true[,k] = lstart F_true[,k] = fstart Y = Y + lstart %*% t(fstart) } return(list(Y = Y, L_true = L_true, F_true = F_true, Error = E)) } data = sim_K(K=10,N=100, P=300, SF = 0.95, SL = 0.8, signal = 1,noise = 1) Y = data$Y L_true = data$L_true F_true = data$F_true E = data$Error svdl = svd(Y)$u svdf = svd(Y)$v svdd = svd(Y)$d plot(svdd) svdK = 10 sqrt(mean(((Y - svdl[,1:svdK]%*% diag(svdd[1:svdK]) %*% t(svdf[,1:svdK]) )- E)^2)) / sqrt(mean(((Y - 0)- E)^2)) #sqrt(mean(((Y - svdd[1] * svdl[,1] %*% t(svdf[,1]) )- E)^2)) / sqrt(mean(((Y - 0)- E)^2)) ggreedy = greedy(Y,K=30) # initial gl and gf as svd gl = ggreedy$l gf = ggreedy$f dim(gl) sqrt(mean(((Y - gl%*%t(gf) )- E)^2)) / sqrt(mean(((Y - 0)- E)^2)) # proportion in eigenvalue = rep(0,dim(gl)[2]) for(i in 1:length(eigenvalue)){ eigenvalue[i] = (sqrt(mean((gl[,i]%*%t(gf[,i]))^2))) } eigenvalue = eigenvalue / sqrt(mean(Y^2)) plot(eigenvalue) gback = backfitting(Y,gl,gf,tautol = 100, numtau = 5) gbl = gback$l gbf = gback$f dim(gbl) sqrt(mean(((Y - gbl%*%t(gbf) )- E)^2)) / sqrt(mean(((Y - 0)- E)^2))
The comparing criterion is $$\frac{RMSE - RMSE_{opt}}{RMSE_{naive}}$$
If we know the variance, we can do better based on our model.
we can see that if we known the variance matrix, we can do better than using flash with constant variance. I have also compared those with PMD, and the result is flash_hd better then flash which is better then PMD.
sim_hd = function(N, P, SF, SL, signal, a = rchisq(N,3),b = rchisq(P,1),mu = 0){ E = matrix(rep(0,N*P),nrow=N) sig2_true = matrix(rep(0,N*P),nrow=N) for(i in 1:N){ for(j in 1:P){ sig2_true[i,j] = mu + a[i] + b[j] E[i,j] = rnorm(1,0,sqrt(mu + a[i] + b[j])) } } K=1 lstart = rnorm(N, 0, signal) fstart = rnorm(P, 0, signal) index = sample(seq(1:N),(N*SL)) lstart[index] = 0 index = sample(seq(1:P),(P*SF)) fstart[index] = 0 Y = lstart %*% t(fstart) + E return(list(Y = Y, L_true = lstart, F_true = fstart, Error = E,sig2_true = sig2_true)) } library("flash") N = 100 P = 200 SF = 0.5 SL = 0.5 signal = 1 data = sim_hd(N, P, SF, SL, signal, a = rchisq(N,3),b = rchisq(P,1),mu = 0) sigmae2_true = data$sig2_true Y = data$Y E = data$Error ghd = flash_hd(Y,partype = "known",sigmae2 = sigmae2_true) l = ghd$l f = ghd$f sqrt(mean(((Y - l %*% t(f))-E)^2))/sqrt(mean((Y - E)^2)) ghd = flash_hd(Y,partype = "constant") l = ghd$l f = ghd$f sqrt(mean(((Y - l %*% t(f))-E)^2))/sqrt(mean((Y - E)^2)) gvem = flash(Y) l = gvem$l f = gvem$f sqrt(mean(((Y - l %*% t(f))-E)^2))/sqrt(mean((Y - E)^2))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.