rBMF-package: Boolean Matrix Factorization

Description Details Author(s) References See Also Examples

Description

Provides four boolean matrix factorization (BMF) methods. BMF has many applications like data mining and categorical data analysis. BMF is also known as boolean matrix decomposition (BMD) and was found to be an NP-hard (non-deterministic polynomial-time) problem. Currently implemented methods are 'Asso' Miettinen, Pauli and others (2008) <doi:10.1109/TKDE.2008.53>, 'GreConD' R. Belohlavek, V. Vychodil (2010) <doi:10.1016/j.jcss.2009.05.002> , 'GreConDPlus' R. Belohlavek, V. Vychodil (2010) <doi:10.1016/j.jcss.2009.05.002> , 'topFiberM' A. Desouki, M. Roeder, A. Ngonga (2019) <arXiv:1903.10326>.

Details

The DESCRIPTION file: This package was not yet installed at build time.
Index: This package was not yet installed at build time.

Author(s)

Abdelmoneim Amer Desouki

References

topFiberM -Desouki, A. A., Röder, M., & Ngomo, A. C. N. (2019). topFiberM: Scalable and Efficient Boolean Matrix Factorization. arXiv preprint arXiv:1903.10326.

Asso -Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., & Mannila, H. (2008). The discrete basis problem. IEEE transactions on knowledge and data engineering, 20(10), 1348-1362.

GreConD, GreConDPlus -Belohlavek R., Vychodil V.: Discovery of optimal factors in binary data via a novel method of matrix decomposition. Journal of Computer and System Sciences 76(1)(2010), 3-20

See Also

topFiberM Asso_approximate GreConD GreConDPlus

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
data(DBLP)
 X=DBLP
    r=7
     Xb=X==1#Convert to boolean
    tempX=as(X,'TsparseMatrix')
    stats=NULL
    for(tP in c(0.2,0.3,0.4,0.5,0.6,0.7,0.8,1)){
    
      Res=topFiberM(Xb,r=r,tP=tP,SR=100,verbose=1)
    
    X_=Res$A %*% Res$B
    X_=as(X_,'TsparseMatrix')
    #Calculate metrics
    li=tempX@i[tempX@x==1]+1
    lj=tempX@j[tempX@x==1]+1
    tp=sum(X_[cbind(li,lj)]>0)
    fn=sum(X)-tp#sum(!X_[cbind(li,lj)])
    fp=sum(X_@x>0)-tp
    cv=1-(fp+fn)/(tp+fn)
    stats=rbind(stats,cbind(tP,tp,fn,fp,cv,P=tp*1.0/(tp+fp),R=tp*1.0/(tp+fn)))
    }

   
    plot(stats[,'tP'],stats[,'R'],type='b',col='red',lwd=2,
    main=sprintf('topFiberM, dataset: %s, 
         #Known facts:%d','DBLP',sum(X)),ylab="",xlab='tP',
    xlim=c(0,1),ylim=c(0,1))
    HM=apply(stats,1,function(x){2/(1/x['P']+1/x['R'])})
    points(stats[,'tP'],stats[,'P'],col='blue',lwd=2,type='b')
    points(stats[,'tP'],HM,col='green',lwd=2,type='b')
    grid(nx=10, lty = "dotted", lwd = 2)
    legend(legend=c('Recall','Precision','Harmonic mean'),col=c('red','blue','green'),
    x=0.6,y=0.2,pch=1,cex=0.75,lwd=2)

rBMF documentation built on Jan. 16, 2021, 5:31 p.m.