removePunctuation: Remove Punctuation Marks from a Text Document

View source: R/transform.R

removePunctuationR Documentation

Remove Punctuation Marks from a Text Document

Description

Remove punctuation marks from a text document.

Usage

## S3 method for class 'character'
removePunctuation(x,
                  preserve_intra_word_contractions = FALSE,
                  preserve_intra_word_dashes = FALSE,
                  ucp = FALSE, ...)
## S3 method for class 'PlainTextDocument'
removePunctuation(x, ...)

Arguments

x

a character vector or text document.

preserve_intra_word_contractions

a logical specifying whether intra-word contractions should be kept.

preserve_intra_word_dashes

a logical specifying whether intra-word dashes should be kept.

ucp

a logical specifying whether to use Unicode character properties for determining punctuation characters. If FALSE (default), characters in the ASCII [:punct:] class are taken; if TRUE, the characters with Unicode general category P (Punctuation).

...

arguments to be passed to or from methods; in particular, from the PlainTextDocument method to the character method.

Value

The character or text document x without punctuation marks (besides intra-word contractions (‘⁠'⁠’) and intra-word dashes (‘⁠-⁠’) if preserve_intra_word_contractions and preserve_intra_word_dashes are set, respectively).

See Also

getTransformations to list available transformation (mapping) functions.

regex shows the class [:punct:] of punctuation characters.

https://unicode.org/reports/tr44/#General_Category_Values.

Examples

data("crude")
inspect(crude[[14]])
inspect(removePunctuation(crude[[14]]))
inspect(removePunctuation(crude[[14]],
                          preserve_intra_word_contractions = TRUE,
                          preserve_intra_word_dashes = TRUE))

tm documentation built on Sept. 11, 2024, 6:47 p.m.