clean_invalid_characters: Clean Invalid Characters in a Data Frame or Tibble

View source: R/clean_invalid_characters.R

clean_invalid_charactersR Documentation

Clean Invalid Characters in a Data Frame or Tibble

Description

This function cleans invalid characters in a data frame or tibble. It converts all character columns to UTF-8 encoding, replaces invalid characters with a specified substitution string, and removes occurrences of specific unwanted patterns (e.g., ⁠<a0>⁠). The returned object retains the input type (data frame or tibble).

Usage

clean_invalid_characters(data, sub = " ", pattern = "<a0>")

Arguments

data

A data frame or tibble containing the data to be cleaned.

sub

A string used to replace invalid characters. Default is a single space " ".

pattern

A character string representing the pattern to remove. Default is "<a0>".

Value

A cleaned data frame or tibble, depending on the input type. The structure of the input object is preserved.

Examples

# Example data frame
df <- data.frame(
  col1 = c("valid", "invalid<a0>text"),
  col2 = c("another<a0>value", "valid"),
  stringsAsFactors = FALSE
)

# Example tibble
library(tibble)
tbl <- tibble(
  col1 = c("valid", "invalid<a0>text"),
  col2 = c("another<a0>value", "valid")
)

# Clean the data frame
cleaned_df <- clean_invalid_characters(df)
cleaned_tbl <- clean_invalid_characters(tbl)


emilelatour/lamisc documentation built on March 29, 2025, 1:23 p.m.