split_segments: Split a character string or corpus into segments

View source: R/split_segments.R

split_segmentsR Documentation

Split a character string or corpus into segments

Description

Split a character string or corpus into segments, taking into account punctuation where possible

Usage

split_segments(obj, segment_size = 40, segment_size_window = NULL)

## S3 method for class 'character'
split_segments(obj, segment_size = 40, segment_size_window = NULL)

## S3 method for class 'Corpus'
split_segments(obj, segment_size = 40, segment_size_window = NULL)

## S3 method for class 'corpus'
split_segments(obj, segment_size = 40, segment_size_window = NULL)

## S3 method for class 'tokens'
split_segments(obj, segment_size = 40, segment_size_window = NULL)

Arguments

obj

character string, quanteda or tm corpus object

segment_size

segment size (in words)

segment_size_window

window around segment size to look for best splitting point

Value

If obj is a tm or quanteda corpus object, the result is a quanteda corpus.

Examples


require(quanteda)
split_segments(data_corpus_inaugural)


rainette documentation built on March 31, 2023, 6:43 p.m.