knitr::opts_chunk$set(
  collapse = TRUE,
  eval = FALSE,
  comment = "#>"
)
Sys.unsetenv("DEBUGME")

The crrri package provides a Chrome Remote Interface for R. It is inspired by the node.js module chrome-remote-interface.

This vignette aims to show several examples of usage for crrri.

All the examples come from the chrome-remote-interface or puppeteer documentations. This vignette shows how to reproduce those using crrri.

Setup

It is better to set up beforehand the HEADLESS_CHROME environment variable to a Chromium/Chrome binary on our system that crrri will use. If you do not, you can provide the path to a Chromium/Chrome binary in Chrome$new() or let the package guess using its find_chrome_binary().

The default behavior of crrri is equivalent to setting the environment variable like this

Sys.setenv(HEADLESS_CHROME = crrri::find_chrome_binary())

We need to load crrri and also promises to have the tools to deals with promises that crrri is based on.

library(crrri)
library(promises)

Example 1: Take a screenshot

This first example is inspired from this post that uses the chrome-remote-interface node.js package.

The first step is to launch Chromium/Chrome in headless mode:

chrome <- Chrome$new()

Then connect R to headless Chromium/Chrome with the connect() method. Since the connection process is not immediate, the connect() method returns a promise that is fulfilled when R is connected to Chrome. The value of this promise is the connection object.

client <- chrome$connect()

You need to write a function whose first parameter will receive the client connection object.

screenshot_file <- tempfile(fileext = ".png")

screenshot <- function(client) {
  # some constants
  targetUrl <- "https://cran.rstudio.com"
  viewport <- c(1440, 900)
  screenshotDelay <- 2 # seconds

  # extract the domain you need
  Page <- client$Page
  Emulation <- client$Emulation

  # enable events for the Page, DOM and Network domains 
  Page$enable() %...>% {
    # modify the viewport settings
    Emulation$setDeviceMetricsOverride(
      width = viewport[1],
      height = viewport[2],
      deviceScaleFactor = 0,
      mobile = FALSE,
      dontSetVisibleSize = FALSE
    )
  } %...>% {
    # go to url
    Page$navigate(targetUrl)
    # wait the page is loaded
    Page$loadEventFired()
  } %>% 
    # add a delay 
    wait(delay = screenshotDelay) %...>% {
    # capture screenshot
    Page$captureScreenshot(format = "png", fromSurface = TRUE)
  } %...>% {
    .$data %>% 
      jsonlite::base64_dec() %>% 
      writeBin(screenshot_file)
  } %>%
  # close headless chrome (client connections are safely closed)
  finally(
    ~ client$disconnect()
  ) %...!% {
    cat("Error:", .$message, "\n")
  }
}

Therefore, you can take a screenshot by executing this screenshot() function:

client %...>% screenshot()
# since screenshot returns an invisible promise, we have to force R to hold 
hold(client %...>% screenshot())

The screenshot is written to disk and looks like this:

knitr::include_graphics("example1-screenshot.png")

Example 2: Dump HTML after page loaded

This example is inspired from this JavaScript script from the chrome-remote-interface wiki that dumps the DOM.

html_file <- tempfile(fileext = ".html")

client <- chrome$connect()

dump_DOM <- function(client) {
  Network <- client$Network
  Page <- client$Page
  Runtime <- client$Runtime
  Network$enable() %...>%
  { Page$enable() } %...>%
  { Network$setCacheDisabled(cacheDisabled = TRUE) } %...>% 
  { Page$navigate(url = "https://github.com") } %...>%
  { Page$loadEventFired() } %...>% { 
    Runtime$evaluate(
      expression = 'document.documentElement.outerHTML'
    ) 
  } %...>% {
    writeLines(c(.$result$value, "\n"), con = html_file) 
  } %>%
  finally(
    ~ client$disconnect()
  ) %...!% {
    cat("Error:", .$message, "\n")
  }
}

Execute the task:

client %...>% dump_DOM()
# since dump_dom is an invisible promise, we have to force R to hold 
hold(client %...>% dump_DOM())
chrome$close()

Here is the first 20 lines of what we get in html_file:

cat(paste0(
  c("```html", readLines("dumpDOM.html", n = 20), "```"), 
  collapse = "\n"
))

This could be useful to parse HTML with rvest after a page is loaded.



RLesur/crrri documentation built on March 20, 2021, 8:47 a.m.