get_sentences: Function extracting sentences form raw text

get_sentencesR Documentation

Function extracting sentences form raw text

Description

This function allows tokenizing text on the level of sentences.

Usage

get_sentences(
  text,
  language,
  lem = FALSE,
  remove_no = FALSE,
  remove_punct = FALSE,
  tolower = FALSE,
  verbose = FALSE,
  n_cores = 1
)

Arguments

text

Vector of strings that is going to be tokenized.

language

Language model that is used for tokenization. See language models at https://github.com/bnosac/udpipe.

lem

Logical parameter for extracting also lemmatized version of a sentence. Default is FALSE.

remove_no

Logical parameter for removing numbers. Default is FALSE.

remove_punct

Logical parameter for removing punctuation. Default is FALSE.

tolower

Logical parameter for transforming strings to lower case. Default is FALSE.

verbose

Logical parameter for displaying extended information on processed data. Works only with processing on one core. Default is FALSE.

n_cores

Numeric parameter for number of cores to be used for processing. Default is 1 core.

Examples

get_sentences()

mmochtak/sentenceR documentation built on Aug. 25, 2022, 9:31 a.m.