These functions help you respond to web pages that declare incorrect
encodings. You can use
guess_encoding to figure out what
the real encoding is (and then supply that to the
encoding argument of
html), or use
repair_encoding to fix character vectors after the
1 2 3
A character vector.
The encoding that the string is actually in. If
These function are wrappers around tools from the fantastic stringi package, so you'll need to make sure to have that installed.
1 2 3 4 5 6 7 8 9
# A file with bad encoding included in the package path <- system.file("html-ex", "bad-encoding.html", package = "rvest") x <- read_html(path) x %>% html_nodes("p") %>% html_text() guess_encoding(x) # Two valid encodings, only one of which is correct read_html(path, encoding = "ISO-8859-1") %>% html_nodes("p") %>% html_text() read_html(path, encoding = "ISO-8859-2") %>% html_nodes("p") %>% html_text()