Document Similarity

Description

This function calculates the similarity between documents and documents.

Usage

1
simDoc(docMatrix1, docMatrix2, norm = FALSE, method = "cosine")

Arguments

docMatrix1

Document matrix whose rows represent feature vector of one document. This matrix must satisfy the following: colnames(docMatrix1) denote feature names, rownames(docMatrix1) denote document names, every element is numerical.

docMatrix2

Document matrix whose rows represent feature vector of one document. This matrix must satisfy the following: colnames(docMatrix2) denote feature names, rownames(docMatrix2) denote document names, every element is numerical.

norm

Whether normalize similarity matrix or not.

method

Method to caluculate similarity.

Value

Similarity Matrix whose rows represent documents of docMatrix1 and whose columns represent documents of docMatrix2. This matrix is n * m matrix where n=ncol(docMatrix1) and m=ncol(docMatrix2), and satisfy the following: rownames(returnValue)=colnames(docMatrix1), colnames(returnValue)=colnames(docMatrix2).

Author(s)

Masaaki TAKADA

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## The function is currently defined as
function (docMatrix1, docMatrix2, norm = FALSE, method = "cosine") 
{
    library("proxy")
    exDocMatrix <- uniform(docMatrix1, docMatrix2)
    exDocMatrix1 <- exDocMatrix[[1]]
    exDocMatrix2 <- exDocMatrix[[2]]
    colnames(exDocMatrix1) <- paste("r_", colnames(docMatrix1), 
        sep = "")
    colnames(exDocMatrix2) <- paste("c_", colnames(docMatrix2), 
        sep = "")
    sim <- as.matrix(simil(t(cbind(exDocMatrix1, exDocMatrix2)), 
        method = method))[colnames(exDocMatrix1), colnames(exDocMatrix2)]
    rownames(sim) <- colnames(docMatrix1)
    colnames(sim) <- colnames(docMatrix2)
    if (norm) {
        sim <- normalize(sim)
    }
    return(sim)
  }