aws_launch_instance: Launch a new ScrapeBot instance as AWS EC2 instance

Description Usage Arguments Details Value See Also Examples

View source: R/aws.R

Description

Note that this could cause costs, depending on your free tier and the chosen ec2_type. Also, note that as this function waits for a new instance to launch and reboot, it might take a couple of seconds.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
aws_launch_instance(
  aws_connection,
  instance_owner,
  scrapebot_credential_section = aws_connection$rds_credential_section,
  ec2_type = "t2.micro",
  ec2_image = NA_character_,
  ec2_image_username = "ubuntu",
 
    browser_useragent = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:85.0) Gecko/20100101 Firefox/85.0",
  browser_language = "de-de",
  browser_width = 1920,
  browser_height = 1080
)

Arguments

aws_connection

AWS connection object, as retrieved from connect_aws(). This also specifies the region.

instance_owner

The email address of the ScrapeBot user who will be the owner of the new instance as character string. If this one does not exist, it will be created (in this case, a text will be raised).

scrapebot_credential_section

The section within your INI file holding the credentials to the ScrapeBot central database as character string. Default is to use the one set by aws_launch_database() into the aws_connection object.

ec2_type

AWS instance type. The default, t2.micro, qualifies for the free tier. Variuos t3 types have also proven useful but are connected with costs.

ec2_image

AWS Amazon Machine Image (AMI) to use. Default is NA which translates to using a region's default image via aws_default_ec2_image().

ec2_image_username

The username to log into the respective ec2_image. for Ubuntu images on AWS, this is ubuntu.

browser_useragent

The emulated browser's user agent to send to requested websites. Default is a recent Firefox Desktop used under Ubuntu Linux. Will be deployed into ScrapeBot config file.

browser_language

Language to which emulated browser is set. Default is German German. Will be deployed into ScrapeBot config file.

browser_width

Width of the emulated browser in pixels. Default is a recent desktop monitor size. Will be deployed into ScrapeBot config file.

browser_height

Height of the emulated browser in pixels. Default is a recent desktop monitor size. Will be deployed into ScrapeBot config file.

Details

This function follows the suggested ScrapeBot behavior in doing six things:

  1. Launch (i.e., create and run) one new EC2 instance in the region specified in connect_aws()

    • this also means to create a new security group that allows SSH traffic from anywhere

    • afterwards, an SSH connection is established with the new instance

  2. Update the available package-manager repositories and install requirements (python3, firefox, xvfb, git)

  3. Get (i.e., clone) the latest version of ScrapeBot

  4. Provide Firefox's Geckodriver with execution rights

  5. Install Python requirements

  6. Setup the newly created ScrapeBot instance

    • register the EC2 instance as ScrapeBot instance in the central database

    • set up the ScrapeBot instance with the specified database and S3 settings

    • set up a cronjob to run the ScrapeBot every 2 minutes

As this function is aimed at minimizing efforts in setting up a ScrapeBot instance, not all details as available within AWS can be modified here. Specify, however:

This function then automatically chooses the following settings:

Value

The updated AWS connection object which get an ec2_instance attached in the respective tibble (keep/store this to later terminate the instance also).

See Also

connect_aws(), aws_default_ec2_image(), get_or_create_user(), aws_terminate_instance()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Not run: 

aws_connection <- connect_aws()
scrapebot_connection <- connect('my_db on localhost')
#t3.large:
#- not cost-free
#- 2 virtual CPUs
#- 8GiB RAM
#- up to 5GBit of down-/uplink speed
aws_launch_instance(aws_connection, scrapebot_connection, 'mario@haim.it', ec_type = 't3.large')

## End(Not run)

MarHai/ScrapeBotR documentation built on March 10, 2021, 10:10 a.m.