sentenceSimil: Compute distance between sentences

Description Usage Arguments Value References Examples

View source: R/sentenceSimil.R

Description

Compute distance between sentences using modified idf cosine distance from "LexRank: Graph-based Lexical Centrality as Salience in Text Summarization". Output can be used as input to lexRankFromSimil.

Usage

1
sentenceSimil(sentenceId, token, docId = NULL, sentencesAsDocs = FALSE)

Arguments

sentenceId

A character vector of sentence IDs corresponding to the docId and token arguemants.

token

A character vector of tokens corresponding to the docId and sentenceId arguemants.

docId

A character vector of document IDs corresponding to the sentenceId and token arguemants. Can be NULL if sentencesAsDocs is TRUE.

sentencesAsDocs

TRUE or FALSE, indicating whether or not to treat sentences as documents when calculating tfidf scores. If TRUE, inverse document frequency will be calculated as inverse sentence frequency (useful for single document extractive summarization)

Value

A 3 column dataframe of pairwise distances between sentences. Columns: sent1 (sentence id), sent2 (sentence id), & dist (distance between sent1 and sent2).

References

http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume22/erkan04a-html/erkan04a.html

Examples

1
2
3
sentenceSimil(docId=c("d1","d1","d2","d2"),
               sentenceId=c("d1_1","d1_1","d2_1","d2_1"),
               token=c("i", "ran", "jane", "ran"))

AdamSpannbauer/lexRankr documentation built on Feb. 4, 2018, 12:12 p.m.