RtextSummary: Summarizes Text by Extracting Relevant Sentences

Build a text summary by extracting relevant sentences from your text. The training dataset should consist of several documents, each document should have sentences separated by a period. While fitting the model, the 'term frequency - inverse document frequency' (TF-IDF) matrix that reflects how important a word is to a document is calculated first. Then vector representations for words are obtained from the 'global vectors for word representation' algorithm (GloVe). While applying the model on new data, the GloVe word vectors for each word are weighted by their TF-IDF weights and averaged to give a sentence vector or a document vector. The magnitude of this sentence vector gives the importance of that sentence within the document. Another way to obtain the importance of the sentence is to calculate cosine similarity between the sentence vector and the document vector. The output can either be at the sentence level (sentences and weights are returned) or at a document level (the summary for each document is returned). It is useful to first get a sentence level output and get quantiles of the sentence weights to determine a cutoff threshold for the weights. This threshold can then be used in the document level output. This method is a variation of the TF-IDF extractive summarization method mentioned in a review paper by Gupta (2010) <doi:10.4304/jetwi.2.3.258-268>.

Getting started

Package details

AuthorSuryavanshi Abhijit [aut, cre]
MaintainerSuryavanshi Abhijit <abhi.surya@gmail.com>
LicenseGPL-3
Version0.1.0
Package repositoryView on CRAN
Installation Install the latest version of this package by entering the following in R:
install.packages("RtextSummary")

Try the RtextSummary package in your browser

Any scripts or data that you put into this service are public.

RtextSummary documentation built on June 7, 2019, 9:03 a.m.