enc_check2: Find and replace invalid UTF-8 bytes.

Description Usage Arguments Details Value

Description

enc_check2() detects invalid UTF-8 bytes by using stringi::stri_enc_isutf8(). NOTE: This function is intended only for use on UTF-8 systems. If in doubt about your system encoding, run Sys.getlocale().

Usage

1
enc_check2(dataset)

Arguments

dataset

A data.frame. Untested for newer objects from dplyr, tibble, and data.table packages.

Details

This function generates zero false positives. check_column_encoding applies a character vector of regular expressions that evaluate to invalid bytes to pattern matching functions that search through dataset. enc_check2 utilizes stringi::stri_enc_isutf8() and processes the results into a useable format.

Value

Provisional: A matrix whose column names are those of dataset, or else a list with a single element if only one column of dataset contains invalid UTF-8 bytes.


jkroes/FixEncoding documentation built on May 19, 2019, 12:44 p.m.