check.encoding: Check character encoding in corpus folder
In stylo: Stylometric Multivariate Analyses

check.encoding

R Documentation

Check character encoding in corpus folder

Description

Using non-ASCII characters is never trivial, but sometimes unavoidable. Specifically, most of the world's languages use non-Latin alphabets or diacritics added to the standard Latin script. The default character encoding in stylo is UTF-8, deviating from it can cause problems. This function allows users to check the character encoding in a corpus. A summary is returned to the termial and a detailed list reporting the most probable encodings of all the text files in the folder can be written to a csv file. The function is basically a wrapper around the function guess_encoding() from the 'readr' package by Wickham et al. (2017). To change the encoding to UTF-8, try the change.encoding() function.

Usage

check.encoding(corpus.dir = "corpus/", output.file = NULL)

Arguments

`corpus.dir`	path to the folder containing the corpus.
`output.file`	path to a csv file that reports the most probable encoding for each text file in the corpus.

Details

If no additional argument is passed, then the function tries to check the text files in the default subdirectory corpus.

Value

The function returns a summary message and writes detailed results into a csv file.

Author(s)

Steffen Pielström

References

Wickham , H., Hester, J., Francois, R., Jylanki, J., and Jørgensen, M. (2017). Package: 'readr'. <https://cran.r-project.org/web/packages/readr/readr.pdf>.

Examples

## Not run: 
# standard usage from stylo working directory with a 'corpus' subfolder:
check.encoding()

# specifying another folder:
check.encoding("~/corpora/example1/")

# specifying an output file:
check.encoding(output.file = "~/experiments/charencoding/example1.csv")


## End(Not run)

stylo documentation built on May 29, 2024, 1:37 a.m.

stylo index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

stylo
Stylometric Multivariate Analyses

check.encoding: Check character encoding in corpus folder
In stylo: Stylometric Multivariate Analyses

Check character encoding in corpus folder

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to check.encoding in stylo...

R Package Documentation

Browse R Packages

We want your feedback!

stylo Stylometric Multivariate Analyses

check.encoding: Check character encoding in corpus folder In stylo: Stylometric Multivariate Analyses

Check character encoding in corpus folder

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Related to check.encoding in stylo...

R Package Documentation

Browse R Packages

We want your feedback!

stylo
Stylometric Multivariate Analyses

check.encoding: Check character encoding in corpus folder
In stylo: Stylometric Multivariate Analyses