Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/RescalReconstructBack.R
reconstruct a predicate or set of predicates from RESCAL factorization result by returning top scores in each predicate.
1 2 3 4 |
R |
core tensor resulting from RESCAL factorization (r by r by m ). |
A |
Embedding matrix part resulting from RESCAL factorization. |
cnt_prd |
vector with length of number of predicate giving number of triples in each predicate. When 0 the predicate will not be processed. |
scale_fact |
scale factor, the generated number of triples in a predicate s is sf*prd_cnt[s] |
verbose |
the level of messages to be displayed, 0 is minimal. |
ncores |
number of cores used to run in parallel, 0 means no paralellism |
OS_WIN |
True when the operating system is windows, used to allow using Fork when running in parallel |
pve |
positive value: representing the smallest value of allowed score of reconstructed triple. |
grpLen |
length of one group of iterations, when running iterations in parallel results are collected for all iterations to be summarized after last iteration. Thus more memory is required. To avoid that iterations are divided to groups with summaries calculated for each group. Default 15. |
ChnkLen |
number of rows in one chunk. Default 1000. |
generateLog |
save output when running in parallel to a log file in current directory. |
saveRes |
opionally save each predicate |
dsname |
optional:name of dataset |
rmx |
optionally give the max absolute value in A to save its calculation in multiple calls |
readjustChnkLen |
automatically increase chunk length when possible |
TotalChnkSize |
instead of defining ChnkLen define the number of pairs of entities to processed every chunk. Equal to ChnkLen*N, where N is the number of entities (i.e. nrows(A)) |
chTrpCntTol |
tolerance in number of triples returned in one chunk, will be eliminated at end of group |
Multiplication of A*R[[p]]*A^T can be impossible in typical 16GB RAM machine when the number of entities is more than 50K. (needs ~25GB RAM) To deal with that we use chunks to obtain the top scores from such multiplication. The main idea is to get the required number of triples from each chunk then choose from them again the top scores.
LIST of three components:
ijk |
A data frame containing the reconstructed triples using indexes of entities |
val |
A vector containing the scores of the reconstructed triples |
act_thr |
the minimum score in each predicate (minimum score of a triple, threshold) |
Abdelmoneim Amer Desouki
-Maximilian Nickel, Volker Tresp, Hans-Peter-Kriegel, "Factorizing YAGO: Scalable Machine Learning for Linked Data" WWW 2012, Lyon, France
-SynthG: mimicking RDF Graphs Using Tensor Factorization, Desouki et al. IEEE ICSC 2021
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | ## Not run:
library(RDFTensor)
data('umls_tnsr')
ntnsr=umls_tnsr
#Calculate Factorization
tt0=proc.time()
tt=rescal(ntnsr$X,rnk=10,ainit='nvecs',verbose=2,lambdaA=0,epsilon=1e-4,lambdaR=0)
ttq1=proc.time()
A=tt$A
R=tt$R
# reconstruct second predicate (slice) in tensor
p=2
prd_cnt=rep(0,length(ntnsr$X))#Zero counts will not be reconstructed
prd_cnt[p]=sum(ntnsr$X[[p]])
tmpRes=inv_rescal_sf_prd_chnkgrp(R,A,cnt_prd=prd_cnt,ChnkLen=50,grpLen=6,OS_WIN=TRUE,ncores=1,
chTrpCntTol=1000, TotalChnkSize=1e4)
ijk=tmpRes[[1]]
ix=which(ntnsr$X[[p]]==1,arr.ind=TRUE)
oijk=cbind(ix[,1],p,ix[,2])#Original
flag= paste(oijk[,1],oijk[,2],oijk[,3]) %in% paste(ijk[,1],ijk[,2],ijk[,3])
print(table( flag))#True positives
pTrp=cbind(ntnsr$SO[ijk[,1]],ntnsr$P[ijk[,2]],ntnsr$SO[ijk[,3]])
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.