Description Usage Arguments Value Examples
Parse a character vector of documents into into both sentences and a clean vector of tokens. The resulting output includes IDs for document and sentence for use in other lexRank
functions.
1 2 3 |
text |
A character vector of documents to be parsed into sentences and tokenized. |
docId |
A character vector of document Ids the same length as |
removePunc |
|
removeNum |
|
toLower |
|
stemWords |
|
rmStopWords |
|
A list of dataframes. The first element of the list returned is the sentences
dataframe; this dataframe has columns docId
, sentenceId
, & sentence
(the actual text of the sentence). The second element of the list returned is the tokens
dataframe; this dataframe has columns docId
, sentenceId
, & token
(the actual text of the token).
1 2 | sentenceTokenParse(c("Bill is trying to earn a Ph.D.", "You have to have a 5.0 GPA."),
docId=c("d1","d2"))
|
$sentences
docId sentenceId sentence
1 d1 d1_1 Bill is trying to earn a Ph.D.
2 d2 d2_1 You have to have a 5.0 GPA.
$tokens
docId sentenceId token
1 d1 d1_1 bill
2 d1 d1_1 earn
3 d1 d1_1 phd
4 d2 d2_1 gpa
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.