validUTF8: Check if a Character Vector is Validly Encoded

validUTF8

R Documentation

Check if a Character Vector is Validly Encoded

Description

Check if each element of a character vector is valid in its implied encoding.

Usage

validUTF8(x)
validEnc(x)

Arguments

`x`	a character vector.

Details

These use similar checks to those used by functions such as grep.

validUTF8 ignores any marked encoding (see Encoding) and so looks directly if the bytes in each string are valid UTF-8. (For the validity of ‘noncharacters’ see the help for intToUtf8.)

validEnc regards character strings as validly encoded unless their encodings are marked as UTF-8 or they are unmarked and the R session is in a UTF-8 or other multi-byte locale. (The checks in other multi-byte locales depend on the OS and as with iconv not all invalid inputs may be detected.)

Value

A logical vector of the same length as x. NA elements are regarded as validly encoded.

Note

It would be possible to check for the validity of character strings in a Latin-1 encoding, but extensions such as CP1252 are widely accepted as ‘Latin-1’ and 8-bit encodings rarely need to be checked for validity.

Examples

x <-
  ## from example(text)
c("Jetz", "no", "chli", "z\xc3\xbcrit\xc3\xbc\xc3\xbctsch:",
  "(noch", "ein", "bi\xc3\x9fchen", "Z\xc3\xbc", "deutsch)",
   ## from a CRAN check log
   "\xfa\xb4\xbf\xbf\x9f")
validUTF8(x)
validEnc(x) # depends on the locale
Encoding(x) <-"UTF-8"
validEnc(x) # typically the last, x[10], is invalid

## Maybe advantageous to declare it "unknown":
G <- x ; Encoding(G[!validEnc(G)]) <- "unknown"
try( substr(x, 1,1) ) # gives 'invalid multibyte string' error in a UTF-8 locale
try( substr(G, 1,1) ) # works in a UTF-8 locale
nchar(G) # fine, too
## but it is not "more valid" typically:
all.equal(validEnc(x),
          validEnc(G)) # typically TRUE

abbreviate: Abbreviate Strings agrep: Approximate String Matching (Fuzzy Matching) all: Are All Values True? all.equal: Test if Two Objects are (Nearly) Equal allnames: Find All Names in an Expression any: Are Some Values True? aperm: Array Transposition append: Vector Merging apply: Apply Functions Over Array Margins args: Argument List of a Function Arithmetic: Arithmetic Operators array: Multi-way Arrays as.data.frame: Coerce to a Data Frame as.Date: Date Conversion Functions to and from Character as.environment: Coerce to an Environment Object as.function: Convert Object to Function AsIs: Inhibit Interpretation/Conversion of Objects asplit: Split Array/Matrix By Its Margins as.POSIXlt: Date-time Conversion Functions assign: Assign a Value to a Name assignOps: Assignment Operators attach: Attach Set of R Objects to Search Path attr: Object Attributes attributes: Object Attribute Lists autoload: On-demand Loading of Packages backsolve: Solve an Upper or Lower Triangular System base-defunct: Defunct Functions in Package 'base' base-deprecated: Deprecated Functions in Package 'base' base-internal: Internal Objects in Package 'base' basename: Manipulate File Paths base-package: The R Base Package Bessel: Bessel Functions bincode: Bin a Numeric Vector bindenv: Binding and Environment Locking, Active Bindings bitwise: Bitwise Logical Operations body: Access to and Manipulation of the Body of a Function bquote: Partial substitution in expressions browser: Environment Browser browserText: Functions to Retrieve Values Supplied by Calls to the Browser builtins: Returns the Names of All Built-in Objects by: Apply a Function to a Data Frame Split by Factors c: Combine Values into a Vector or List call: Function Calls callCC: Call With Current Continuation CallExternal: Modern Interfaces to C/C++ code capabilities: Report Capabilities of this Build of R cat: Concatenate and Print cbind: Combine R Objects by Rows or Columns character: Character Vectors char.expand: Expand a String with Respect to a Target Table charmatch: Partial String Matching chartr: Character Translation and Casefolding chkDots: Warn About Extraneous Arguments in the "..." of Its Caller chol: The Choleski Decomposition chol2inv: Inverse from Choleski (or QR) Decomposition class: Object Classes col: Column Indexes colnames: Row and Column Names Colon: Colon Operator colSums: Form Row and Column Sums and Means commandArgs: Extract Command Line Arguments comment: Query or Set a '"comment"' Attribute Comparison: Relational Operators complex: Complex Numbers and Basic Functionality conditions: Condition Handling and Recovery conflicts: Search for Masked Objects on the Search Path connections: Functions to Manipulate Connections (Files, URLs, ...) Constants: Built-in Constants contributors: R Project Contributors Control: Control Flow copyright: Copyrights of Files Used to Build R crossprod: Matrix Crossproduct Cstack_info: Report Information on C Stack Size and Usage cumsum: Cumulative Sums, Products, and Extremes curlGetHeaders: Retrieve Headers from URLs cut: Convert Numeric to Factor cut.POSIXt: Convert a Date or Date-Time Object to a Factor data.class: Object Classes data.frame: Data Frames dataframeHelpers: Data Frame Auxiliary Functions data.matrix: Convert a Data Frame to a Numeric Matrix date: System Date and Time Dates: Date Class DateTimeClasses: Date-Time Classes dcf: Read and Write Data in DCF Format debug: Debug a Function Defunct: Marking Objects as Defunct delayedAssign: Delay Evaluation deparse: Expression Deparsing deparseOpts: Options for Expression Deparsing Deprecated: Marking Objects as Deprecated det: Calculate the Determinant of a Matrix detach: Detach Objects from the Search Path dev: Lists of Open/Active Graphics Devices diag: Matrix Diagonals diff: Lagged Differences difftime: Time Intervals / Differences dim: Dimensions of an Object dimnames: Dimnames of an Object do.call: Execute a Function Call dontCheck: Identity Function to Suppress Checking dots: ..., '..1', etc used in Functions double: Double-Precision Vectors dput: Write an Object to a File or Recreate it drop: Drop Redundant Extent Information droplevels: Drop Unused Levels from Factors dump: Text Representations of R Objects duplicated: Determine Duplicate Elements dynload: Foreign Function Interface eapply: Apply a Function Over Values in an Environment eigen: Spectral Decomposition of a Matrix encodeString: Encode Character Vector as for Printing Encoding: Read or Set the Declared Encodings for a Character Vector environment: Environment Access EnvVar: Environment Variables eval: Evaluate an (Unevaluated) Expression exists: Is an Object Defined? expand.grid: Create a Data Frame from All Combinations of Factor Variables expression: Unevaluated Expressions Extract: Extract or Replace Parts of an Object Extract.data.frame: Extract or Replace Parts of a Data Frame Extract.factor: Extract or Replace Parts of a Factor Extremes: Maxima and Minima extSoftVersion: Report Versions of Third-Party Software factor: Factors file.access: Ascertain File Accessibility file.choose: Choose a File Interactively file.info: Extract File Information file.path: Construct Path to File files: File Manipulation files2: Manipulation of Directories and File Permissions file.show: Display One or More Text Files findInterval: Find Interval Numbers or Indices find.package: Find Packages force: Force Evaluation of an Argument forceAndCall: Call a function with Some Arguments Forced Foreign: Foreign Function Interface Foreign-internal: Internal Versions of the Foreign Function Interface formals: Access to and Manipulation of the Formal Arguments format: Encode in a Common Format formatc: Formatting Using C-style Formats formatDL: Format Description Lists format.info: format(.) Information format.pval: Format P Values function: Function Definition funprog: Common Higher-Order Functions in Functional Programming... gc: Garbage Collection gc.time: Report Time Spent in Garbage Collection gctorture: Torture Garbage Collector get: Return the Value of a Named Object getCallingDLL: Compute DLL for Native Interface Call getDLLRegisteredRoutines: Reflectance Information for C/Fortran routines in a DLL getLoadedDLLs: Get DLLs Loaded in Current Session getNativeSymbolInfo: Obtain a Description of one or more Native (C/Fortran)... gettext: Translate Text Messages getwd: Get or Set Working Directory gl: Generate Factor Levels grep: Pattern Matching and Replacement grepRaw: Pattern Matching for Raw Vectors groupGeneric: S3 Group Generic Functions grouping: Grouping Permutation gzcon: (De)compress I/O Through Connections hexmode: Display Numbers in Hexadecimal Hyperbolic: Hyperbolic Functions iconv: Convert Character Vector between Encodings icuSetCollate: Setup Collation by ICU identical: Test Objects for Exact Equality identity: Identity Function ifelse: Conditional Element Selection integer: Integer Vectors interaction: Compute Factor Interactions interactive: Is R Running Interactively? Internal: Call an Internal Function InternalMethods: Internal Generic Functions invisible: Change the Print Mode to Invisible is.finite: Finite, Infinite and NaN Numbers is.function: Is an Object of Type (Primitive) Function? is.language: Is an Object a Language Object? is.object: Is an Object 'internally classed'? ISOdatetime: Date-time Conversion Functions from Numeric Representations isR: Are we using R, rather than S? is.recursive: Is an Object Atomic or Recursive? isS4: Test for an S4 object is.single: Is an Object of Single Precision Type? isSymmetric: Test if a Matrix or other Object is Symmetric (Hermitian) is.unsorted: Test if an Object is Not Sorted jitter: 'Jitter' (Add Noise) to Numbers kappa: Compute or Estimate the Condition Number of a Matrix kronecker: Kronecker Products on Arrays l10n_info: Localization Information labels: Find Labels from Object La_library: LAPACK Library lapply: Apply a Function over a List or Vector Last.value: Value of Last Evaluated Expression La_version: LAPACK Version lazyload: Lazy Load a Database of R Objects length: Length of an Object lengths: Lengths of List or Vector Elements levels: Levels Attributes libcurlVersion: Report Version of libcurl libPaths: Search Paths for Packages library: Loading/Attaching and Listing of Packages library.dynam: Loading DLLs from Packages license: The R License Terms list: Lists - Generic and Dotted Pairs list2DF: Create Data Frame From List list2env: From A List, Build or Add To an Environment list.files: List the Files in a Directory/Folder load: Reload Saved Datasets locales: Query or Set Aspects of the Locale Log: Logarithms and Exponentials Logic: Logical Operators logical: Logical Vectors LongVectors: Long Vectors lower.tri: Lower and Upper Triangular Part of a Matrix ls: List Objects make.names: Make Syntactically Valid Names make.unique: Make Character Strings Unique mapply: Apply a Function to Multiple List or Vector Arguments marginSums: Compute table margins match: Value Matching match.arg: Argument Verification Using Partial Matching match.call: Argument Matching match.fun: Extract a Function Specified by Name MathFun: Miscellaneous Mathematical Functions matmult: Matrix Multiplication mat.or.vec: Create a Matrix or a Vector matrix: Matrices maxCol: Find Maximum Position in Matrix mean: Arithmetic Mean memCompress: In-memory Compression and Decompression memlimits: Query and Set Heap Size Limits Memory: Memory Available for Data Storage Memory-limits: Memory Limits in R memory.profile: Profile the Usage of Cons Cells merge: Merge Two Data Frames message: Diagnostic Messages missing: Does a Formal Argument have a Value? mode: The (Storage) Mode of an Object NA: 'Not Available' / Missing Values name: Names and Symbols names: The Names of an Object nargs: The Number of Arguments to a Function nchar: Count the Number of Characters (or Bytes or Width) nlevels: The Number of Levels of a Factor noquote: Class for 'no quote' Printing of Character Strings norm: Compute the Norm of a Matrix normalizePath: Express File Paths in Canonical Form notyet: Not Yet Implemented Functions and Unused Arguments nrow: The Number of Rows/Columns of an Array ns-dblcolon: Double Colon and Triple Colon Operators ns-hooks: Hooks for Namespace Events ns-internal: Namespace Internals ns-load: Loading and Unloading Name Spaces ns-reflect: Namespace Reflection Support ns-topenv: Top Level Environment NULL: The Null Object numeric: Numeric Vectors NumericConstants: Numeric Constants numeric_version: Numeric Versions octmode: Display Numbers in Octal on.exit: Function Exit Code Ops.Date: Operators on the Date Class options: Options Settings order: Ordering Permutation outer: Outer Product of Arrays Paren: Parentheses and Braces parse: Parse R Expressions paste: Concatenate Strings path.expand: Expand File Paths pcre_config: Report Configuration Options for PCRE pipeOp: Forward Pipe Operator Platform: Platform Specific Variables plot: Generic X-Y Plotting pmatch: Partial String Matching polyroot: Find Zeros of a Real or Complex Polynomial pos.to.env: Convert Positions in the Search Path to Environments pretty: Pretty Breakpoints Primitive: Look Up a Primitive Function print: Print Values print.dataframe: Printing Data Frames print.default: Default Printing prmatrix: Print Matrices, Old-style proc.time: Running Time of R prod: Product of Vector Elements proportions: Express Table Entries as Fraction of Marginal Table pushBack: Push Text Back on to a Connection qr: The QR Decomposition of a Matrix qraux: Reconstruct the Q, R, or X Matrices from a QR Object quit: Terminate an R Session Quotes: Quotes Random: Random Number Generation Random-user: User-supplied Random Number Generation range: Range of Values rank: Sample Ranks rapply: Recursively Apply a Function to a List raw: Raw Vectors rawConnection: Raw Connections rawConversion: Convert to or from (Bit/Packed) Raw Vectors RdUtils: Utilities for Processing Rd Files readBin: Transfer Binary Data To and From Connections readChar: Transfer Character Strings To and From Connections readline: Read a Line from the Terminal readLines: Read Text Lines from a Connection readRDS: Serialization Interface for Single Objects readRenviron: Set Environment Variables from a File Recall: Recursive Calling regex: Regular Expressions as used in R reg.finalizer: Finalization of Objects regmatches: Extract or Replace Matched Substrings rep: Replicate Elements of Vectors and Lists replace: Replace Values in a Vector Reserved: Reserved Words in R rev: Reverse Elements Rhome: Return the R Home Directory rle: Run Length Encoding rm: Remove Objects from a Specified Environment Round: Rounding of Numbers round.POSIXt: Round / Truncate Data-Time Objects row: Row Indexes row.names: Get and Set Row Names for Data Frames rowsum: Give Column Sums of a Matrix or Data Frame, Based on a... S3method: Register S3 Methods sample: Random Samples and Permutations save: Save R Objects scale: Scaling and Centering of Matrix-like Objects scan: Read Data Values search: Give Search Path for R Objects seek: Functions to Reposition Connections seq: Sequence Generation seq.Date: Generate Regular Sequences of Dates seq.POSIXt: Generate Regular Sequences of Times sequence: Create A Vector of Sequences serialize: Simple Serialization Interface sets: Set Operations setTimeLimit: Set CPU and/or Elapsed Time Limits showConnections: Display Connections shQuote: Quote Strings for Use in OS Shells sign: Sign Function sink: Send R Output to a File slice.index: Slice Indexes in an Array slotOp: Extract or Replace A Slot socketSelect: Wait on Socket Connections solve: Solve a System of Equations sort: Sorting or Ordering Vectors source: Read R Code from a File, a Connection or Expressions Special: Special Functions of Mathematics split: Divide into Groups and Reassemble sprintf: Use C-style String Formatting Commands sQuote: Quote Text srcfile: References to Source Files and Code standardGeneric: Formal Method System - Dispatching S4 Methods startsWith: Does String Start or End With Another String? Startup: Initialization at Start of an R Session stop: Stop Function Execution stopifnot: Ensure the Truth of R Expressions strptime: Date-time Conversion Functions to and from Character strrep: Repeat the Elements of a Character Vector strsplit: Split the Elements of a Character Vector strtoi: Convert Strings to Integers strtrim: Trim Character Strings to Specified Display Widths structure: Attribute Specification strwrap: Wrap Character Strings to Format Paragraphs subset: Subsetting Vectors, Matrices and Data Frames substitute: Substituting and Quoting Expressions substr: Substrings of a Character Vector sum: Sum of Vector Elements summary: Object Summaries svd: Singular Value Decomposition of a Matrix sweep: Sweep out Array Summaries switch: Select One of a List of Alternatives Syntax: Operator Syntax and Precedence Sys.getenv: Get Environment Variables Sys.getpid: Get the Process ID of the R Session Sys.glob: Wildcard Expansion on File Paths Sys.info: Extract System and User Information Sys.localeconv: Find Details of the Numerical and Monetary Representations in... sys.parent: Functions to Access the Function Call Stack Sys.readlink: Read File Symbolic Links Sys.setenv: Set or Unset Environment Variables Sys.setFileTime: Set File Time Sys.sleep: Suspend Execution for a Time Interval sys.source: Parse and Evaluate Expressions from a File system: Invoke a System Command system2: Invoke a System Command system.file: Find Names of R System Files system.time: CPU Time Used Sys.time: Get Current Date and Time Sys.which: Find Full Paths to Executables t: Matrix Transpose table: Cross Tabulation and Table Creation tabulate: Tabulation for Vectors tapply: Apply a Function Over a Ragged Array taskCallback: Add or Remove a Top-Level Task Callback taskCallbackManager: Create an R-level Task Callback Manager taskCallbackNames: Query the Names of the Current Internal Top-Level Task... tempfile: Create Names for Temporary Files textconnections: Text Connections tilde: Tilde Operator timezones: Time Zones toString: Convert an R Object to a Character String trace: Interactive Tracing and Debugging of Calls to a Function or... traceback: Get and Print Call Stacks tracemem: Trace Copying of Objects transform: Transform an Object, for Example a Data Frame Trig: Trigonometric Functions trimws: Remove Leading/Trailing Whitespace try: Try an Expression Allowing Error Recovery typeof: The Type of an Object unique: Extract Unique Elements unix/Signals: Interrupting Execution of R unlink: Delete Files and Directories unlist: Flatten Lists unname: Remove 'names' or 'dimnames' UseMethod: Class Methods userhooks: Functions to Get and Set Hooks for Load, Attach, Detach and... utf8Conversion: Convert Integer Vectors to or from UTF-8-encoded Character... UTF8filepaths: File Paths not in the Native Encoding validUTF8: Check if a Character Vector is Validly Encoded vector: Vectors Vectorize: Vectorize a Scalar Function Version: Version Information warning: Warning Messages warnings: Print Warning Messages weekday.POSIXt: Extract Parts of a POSIXt or Date Object which: Which indices are TRUE? which.min: Where is the Min() or Max() or first TRUE or FALSE ? windows/shell: Invoke a System Command, using a Shell windows/shell.exec: Open a File or URL using Windows File Associations with: Evaluate an Expression in a Data Environment withVisible: Return both a Value and its Visibility write: Write Data to a File writeLines: Write Lines to a Connection xtfrm: Auxiliary Function for Sorting and Ranking zapsmall: Rounding of Numbers: Zapping Small Ones to Zero zMachine: Numerical Characteristics of the Machine zpackages: Listing of Packages zScript: Scripting Language Interface zutils: Miscellaneous Internal/Programming Utilities

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com