collect_screenshots_from_s3: Bulk-download (and compress) screenshots stored on S3

Description Usage Arguments Value Examples

View source: R/collect_screenshots_from_s3.R

Description

Download screenshots previously stored on S3 through ScrapeBot instances. The function will collect those cases in the data that refer to S3-stored screenshots and download them to a local output directory. While doing so, the funtion can also resize images to save local disk space.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
collect_screenshots_from_s3(
  scrapebot_connection,
  aws_connection,
  run_uid = NULL,
  instance_uid = NULL,
  recipe_uid = NULL,
  include_inactive = FALSE,
  resize = FALSE,
  resize_max_width = NULL,
  resize_max_height = NULL,
  output_directory = "",
  verbose = TRUE
)

Arguments

scrapebot_connection

A connection object, as retrieved from connect_scrapebot().

aws_connection

AWS connection object, as retrieved from connect_aws(). This also specifies the region.

run_uid

Optional numeric UID or a vector of numeric UIDs of a specific run to collect data from. If NULL, either instance_uid or recipe_uid (or both) has to be provided. Defaults to NULL.

instance_uid

Optional numeric UID or a vector of numeric UIDs of the instance to filter data for. If NULL, either run_uid or recipe_uid (or both) has to be provided. Defaults to NULL.

recipe_uid

Optional numeric UID or a vector of numeric UIDs of the recipe to filter data for. If NULL, either instance_uid or run_uid (or both) has to be provided. Defaults to NULL.

include_inactive

If TRUE, inactive recipes are included along active recipes; defaults to FALSE.

resize

If TRUE, downloaded screenshots will be resized keeping their aspect ratio to the maximum width/height as provided. Use for file-size compression.

resize_max_width

Integer indicating the maximum width images should be resized to (if resize is TRUE). Aspect ratio with resize_max_height will be respected.

resize_max_height

Integer indicating the maximum height images should be resized to (if resize is TRUE). Aspect ratio with resize_max_width will be respected.

output_directory

Character string holding the (relative) path to the directory into which the screenshot files should be downloaded.

verbose

If TRUE, the function will show download progress bar to indicate how far it has come.

Value

A tibble listing all matching run-data entries according to which screenshots should be found on S3. As such, it contains the same amount of rows as received from get_run_data when filtering for the respective parameters and S3 links for "screenshot"-containing recipe steps (i.e., get_recipes first, then get_recipe_steps and filter for "screenshot," then get_run_data and filter for S3 links). For each row, then, the local filename, width, height, and filesize (in bytes) as well as their respective counterparts on S3 (note that, without resizing, width/height/filesize should be practically the same).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
## Not run: 

connection <- connect('my_db on localhost')
collect_screenshots_from_s3(
  scrapebot_connection, aws_connection,
  run_uid = 42
)
collect_screenshots_from_s3(
  scrapebot_connection, aws_connection,
  run_uid = 42,
  resize = TRUE, resize_max_width = 800
)
collect_screenshots_from_s3(
  scrapebot_connection, aws_connection,
  run_uid = 42,
  output_directory = 'download_dir/'
)
disconnect(connection)

## End(Not run)

MarHai/ScrapeBotR documentation built on March 10, 2021, 10:10 a.m.