View source: R/tokenize_sents.R
tokenize_sents | R Documentation |
This function turns a corpus of texts into a quanteda
tokens object of sentences.
tokenize_sents(corpus, model = "en_core_web_sm")
corpus |
A |
model |
The spacy model to use. The default is "en_core_web_sm". |
The function first split each text into paragraphs by splitting at new line markers and then uses spacy to tokenize each paragraph into sentences. The function accepts a plain text corpus input or the output of contentmask()
. This function is necessary to prepare the data for lambdaG()
.
A quanteda
tokens object where each token is a sentence.
## Not run:
toy.pos <- corpus("the N was on the N . he did n't move \n N ; \n N N")
tokenize_sents(toy.pos)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.