read_html_live | R Documentation |
read_html()
operates on the HTML source code downloaded from the server.
This works for most websites but can fail if the site uses javascript to
generate the HTML. read_html_live()
provides an alternative interface
that runs a live web browser (Chrome) in the background. This allows you to
access elements of the HTML page that are generated dynamically by javascript
and to interact with the live page by clicking on buttons or typing in
forms.
Behind the scenes, this function uses the chromote package, which requires that you have a copy of Google Chrome installed on your machine.
read_html_live(url)
url |
Website url to read from. |
read_html_live()
returns an R6 LiveHTML object. You can interact
with this object using the usual rvest functions, or call its methods,
like $click()
, $scroll_to()
, and $type()
to interact with the live
page like a human would.
## Not run:
# When we retrieve the raw HTML for this site, it doesn't contain the
# data we're interested in:
static <- read_html("https://www.forbes.com/top-colleges/")
static %>% html_elements(".TopColleges2023_tableRow__BYOSU")
# Instead, we need to run the site in a real web browser, causing it to
# download a JSON file and then dynamically generate the html:
sess <- read_html_live("https://www.forbes.com/top-colleges/")
sess$view()
rows <- sess %>% html_elements(".TopColleges2023_tableRow__BYOSU")
rows %>% html_element(".TopColleges2023_organizationName__J1lEV") %>% html_text()
rows %>% html_element(".grant-aid") %>% html_text()
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.