View source: R/limpiar_alphanumeric.R
limpiar_alphanumeric | R Documentation |
A simple regex for retaining only a-z, A-Z and 0-9 as well as white space characters, including new lines. This function will remove accented characters, and any non-English characters, punctuation, etc. so it is a heavy-duty approach to cleaning and should be used prudently. If you know that you need to keep accents, try limpiar_non_ascii
first, before avoiding these functions altogether.
limpiar_alphanumeric(data, text_var = mention_content)
data |
Name of your Data Frame or Tibble object |
text_var |
Name of your text variable. Can be given as a 'string' or a symbol - should refer to a column inside |
Data frame with the text variable changed in place
test_df <- data.frame(
text = c(
"Simple text 123", # Basic ASCII only
"Hello! How are you? 😊 🌟", # ASCII + punctuation + emojis
"café München niño", # Latin-1 accented characters
"#special@chars&(~)|[$]", # Special characters and symbols
"混合汉字と日本語 → ⌘ £€¥" # CJK characters + symbols + arrows
)
)
limpiar_alphanumeric(test_df, text)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.