limpiar_non_ascii: Remove non-ASCII characters except those with latin accents
In jpcompartir/LimpiaR: LimpiaR

limpiar_non_ascii

R Documentation

Remove non-ASCII characters except those with latin accents

Description

Function uses a simple RegEx to retain only basic ASCII characters plus attempts to retain characters with latin accents. If you know that you want to remove everything including accented characters then you should use limpiar_alphanumeric.

Usage

limpiar_non_ascii(data, text_var = mention_content)

Arguments

`data`	Name of your Data Frame or Tibble object
`text_var`	Name of your text variable. Can be given as a 'string' or a symbol - should refer to a column inside `data`

Value

Data frame with the text variable changed in place

Examples

test_df <- data.frame(
text = c(
  "Simple text 123",              # Basic ASCII only
  "Hello! How are you? 😊 🌟",    # ASCII + punctuation + emojis
  "café München niño",            # Latin-1 accented characters
  "#special@chars&(~)|[$]",       # Special characters and symbols
  "混合汉字と日本語 → ⌘ £€¥"      # CJK characters + symbols + arrows
)
)

limpiar_non_ascii(test_df, text)

jpcompartir/LimpiaR documentation built on Dec. 9, 2024, 9:43 p.m.