encodings: Conversion between corpus and native encoding.

encodingsR Documentation

Conversion between corpus and native encoding.

Description

Utility functions to convert the encoding of a character vector between the native encoding and the encoding of the corpus.

Usage

as.utf8(x, from)

as.nativeEnc(x, from)

as.corpusEnc(x, from = encoding(), corpusEnc)

Arguments

x

A character to be converted.

from

A character vector describing the encoding of the input character vector.

corpusEnc

A character vector describing the target encoding, i.e. the encoding of the corpus (usually "latin1", "UTF-8")

Details

The encoding of a corpus and the encoding of the terminal (the native encoding) may differ, provoking strange or wrong results if no conversion is carried out between the potentially differing encodings. The functions as.nativeEnc() and as.corpusEnc are auxiliary functions to assist the conversion. The functions as.nativeEnc and as.utf8 deliberately remove the explicit statement of the encoding, to avoid warnings that may occur with character vector columns in a data.table object.


PolMine/polmineR documentation built on Nov. 9, 2023, 8:07 a.m.