simhash: Simhash computation

Description Usage Arguments Details Author(s) References See Also Examples

View source: R/simhash.R

Description

Simhash worker uses the keyword extraction worker to find the keywords and uses simhash algorithm to compute simhash. dict hmm, idf and stop_word should be provided when initializing jiebaR worker.

Usage

1
2
3
simhash(code, jiebar)

vector_simhash(code, jiebar)

Arguments

code

For simhash, a Chinese sentence or the path of a text file. For vector_simhash, a character vector of segmented words.

jiebar

jiebaR Worker.

Details

There is a symbol <= for this function.

Author(s)

Qin Wenfeng

References

MS Charikar - Similarity Estimation Techniques from Rounding Algorithms

See Also

<=.simhash worker

Examples

1
2
3
4
5
6
7
8
## Not run: 
### Simhash
words = "hello world"
simhasher = worker("simhash",topn=1)
simhasher <= words
distance("hello world" , "hello world!" , simhasher)

## End(Not run)

Example output

Loading required package: jiebaRD
$simhash
[1] "3804341492420753273"

$keyword
11.7392 
"hello" 

$distance
[1] 0

$lhs
11.7392 
"hello" 

$rhs
11.7392 
"hello" 

Warning message:
system call failed: Cannot allocate memory 

jiebaR documentation built on Dec. 16, 2019, 1:19 a.m.