tokenize_sents: Tokenize to sentences
In idiolect: Forensic Authorship Analysis

tokenize_sents

R Documentation

Tokenize to sentences

Description

This function turns a corpus of texts into a quanteda tokens object of sentences.

Usage

tokenize_sents(corpus, model = "en_core_web_sm")

Arguments

`corpus`	A `quanteda` corpus object, typically the output of the `create_corpus()` function or the output of `contentmask()`.
`model`	The spacy model to use. The default is "en_core_web_sm".

Details

The function first split each text into paragraphs by splitting at new line markers and then uses spacy to tokenize each paragraph into sentences. The function accepts a plain text corpus input or the output of contentmask(). This function is necessary to prepare the data for lambdaG().

Value

A quanteda tokens object where each token is a sentence.

Examples

## Not run: 
toy.pos <- corpus("the N was on the N . he did n't move \n N ; \n N N")
tokenize_sents(toy.pos)

## End(Not run)

idiolect documentation built on Sept. 11, 2024, 5:34 p.m.

idiolect index

README.md idiolect

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

idiolect
Forensic Authorship Analysis

tokenize_sents: Tokenize to sentences
In idiolect: Forensic Authorship Analysis

Tokenize to sentences

Description

Usage

Arguments

Details

Value

Examples

Related to tokenize_sents in idiolect...

R Package Documentation

Browse R Packages

We want your feedback!

idiolect Forensic Authorship Analysis

tokenize_sents: Tokenize to sentences In idiolect: Forensic Authorship Analysis

Tokenize to sentences

Description

Usage

Arguments

Details

Value

Examples

Related to tokenize_sents in idiolect...

R Package Documentation

Browse R Packages

We want your feedback!

idiolect
Forensic Authorship Analysis

tokenize_sents: Tokenize to sentences
In idiolect: Forensic Authorship Analysis