encoding: Guess and repair faulty character encoding.

Description Usage Arguments stringi Examples

Description

These functions help you respond to web pages that declare incorrect encodings. You can use guess_encoding to figure out what the real encoding is (and then supply that to the encoding argument of html), or use repair_encoding to fix character vectors after the fact.

Usage

1
2
3

Arguments

x

A character vector.

from

The encoding that the string is actually in. If NULL,

stringi

These function are wrappers around tools from the fantastic stringi package, so you'll need to make sure to have that installed.

Examples

1
2
3
4
5
6
7
8
9
# A file with bad encoding included in the package
path <- system.file("html-ex", "bad-encoding.html", package = "rvest")
x <- read_html(path)
x %>% html_nodes("p") %>% html_text()

guess_encoding(x)
# Two valid encodings, only one of which is correct
read_html(path, encoding = "ISO-8859-1") %>% html_nodes("p") %>% html_text()
read_html(path, encoding = "ISO-8859-2") %>% html_nodes("p") %>% html_text()

rvest documentation built on May 19, 2017, 7:49 a.m.

Search within the rvest package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.

Please suggest features or report bugs in the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.