Description Usage Arguments Value Author(s) Examples
This function calculates the similarity between documents and documents by using dictionary.
1 |
docMatrix1 |
Document matrix whose rows represent feature vector of one document. This matrix must satisfy the following: colnames(docMatrix1) denote feature names, rownames(docMatrix1) denote document names, every element is numerical. |
docMatrix2 |
Document matrix whose rows represent feature vector of one document. This matrix must satisfy the following: colnames(docMatrix2) denote feature names, rownames(docMatrix2) denote document names, every element is numerical. |
scoreDict |
Dictionary matrix which converts features to numbers. This matrix must k * 2 matrix: 1st colmn represents features and 2nd column represents corresponding number. Similarity is calculated according to the number. |
breaks |
Range vector of frequency distribution. Each element must be ascending order. |
norm |
Whether normalize similarity matrix or not. |
method |
Method to caluculate similarity. |
scoreFunc |
Function of scoring from dictionary. |
Similarity Matrix whose rows represent documents of docMatrix1 and whose columns represent documents of docMatrix2. This matrix is n * m matrix where n=ncol(docMatrix1) and m=ncol(docMatrix2), and satisfy the following: rownames(returnValue)=colnames(docMatrix1), colnames(returnValue)=colnames(docMatrix2).
Masaaki TAKADA
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | ## The function is currently defined as
function (docMatrix1, docMatrix2, scoreDict, breaks = seq(-1,
1, length = 11), norm = FALSE, method = "cosine", scoreFunc = mean)
{
library("proxy")
words <- unique(rbind(matrix(rownames(docMatrix1)), matrix(rownames(docMatrix2))))
words <- words[order(words)]
wordScores <- rep(NA, length(words))
for (i in 1:length(words)) {
cond <- (scoreDict[, 1] == words[i])
value <- scoreDict[cond, 2]
if (length(value) != 0) {
wordScores[i] <- scoreFunc(value, na.rm = TRUE)
}
}
names(breaks) <- cut(breaks, breaks)
wordClass <- cut(wordScores, breaks)
names(wordClass) <- words
docFreq1 <- conv2Freq(docMatrix1, wordClass, breaks)
docFreq2 <- conv2Freq(docMatrix2, wordClass, breaks)
colnames(docFreq1) <- paste("r_", colnames(docMatrix1), sep = "")
colnames(docFreq2) <- paste("c_", colnames(docMatrix2), sep = "")
sim <- as.matrix(simil(t(cbind(docFreq1, docFreq2)), method = method))[colnames(docFreq1),
colnames(docFreq2)]
rownames(sim) <- colnames(docMatrix1)
colnames(sim) <- colnames(docMatrix2)
if (norm) {
sim <- normalize(sim)
}
return(sim)
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.