aws_launch_instance: Launch a new ScrapeBot instance as AWS EC2 instance
In MarHai/ScrapeBotR: Orchestrate Instances and Retrieve Data from a ScrapeBot Database

Description Usage Arguments Details Value See Also Examples

Note that this could cause costs, depending on your free tier and the chosen ec2_type. Also, note that as this function waits for a new instance to launch and reboot, it might take a couple of seconds.

aws_launch_instance(
  aws_connection,
  instance_owner,
  scrapebot_credential_section = aws_connection$rds_credential_section,
  ec2_type = "t2.micro",
  ec2_image = NA_character_,
  ec2_image_username = "ubuntu",
 
    browser_useragent = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:85.0) Gecko/20100101 Firefox/85.0",
  browser_language = "de-de",
  browser_width = 1920,
  browser_height = 1080
)

`aws_connection`	AWS connection object, as retrieved from `connect_aws()`. This also specifies the region.
`instance_owner`	The email address of the ScrapeBot user who will be the owner of the new instance as character string. If this one does not exist, it will be created (in this case, a text will be raised).
`scrapebot_credential_section`	The section within your INI file holding the credentials to the ScrapeBot central database as character string. Default is to use the one set by `aws_launch_database()` into the aws_connection object.
`ec2_type`	AWS instance type. The default, `t2.micro`, qualifies for the free tier. Variuos `t3` types have also proven useful but are connected with costs.
`ec2_image`	AWS Amazon Machine Image (AMI) to use. Default is `NA` which translates to using a region's default image via `aws_default_ec2_image()`.
`ec2_image_username`	The username to log into the respective `ec2_image`. for Ubuntu images on AWS, this is `ubuntu`.
`browser_useragent`	The emulated browser's user agent to send to requested websites. Default is a recent Firefox Desktop used under Ubuntu Linux. Will be deployed into ScrapeBot config file.
`browser_language`	Language to which emulated browser is set. Default is German German. Will be deployed into ScrapeBot config file.
`browser_width`	Width of the emulated browser in pixels. Default is a recent desktop monitor size. Will be deployed into ScrapeBot config file.
`browser_height`	Height of the emulated browser in pixels. Default is a recent desktop monitor size. Will be deployed into ScrapeBot config file.

This function follows the suggested ScrapeBot behavior in doing six things:

Launch (i.e., create and run) one new EC2 instance in the region specified in connect_aws()
- this also means to create a new security group that allows SSH traffic from anywhere
- afterwards, an SSH connection is established with the new instance
Update the available package-manager repositories and install requirements (python3, firefox, xvfb, git)
Get (i.e., clone) the latest version of ScrapeBot
Provide Firefox's Geckodriver with execution rights
Install Python requirements
Setup the newly created ScrapeBot instance
- register the EC2 instance as ScrapeBot instance in the central database
- set up the ScrapeBot instance with the specified database and S3 settings
- set up a cronjob to run the ScrapeBot every 2 minutes

As this function is aimed at minimizing efforts in setting up a ScrapeBot instance, not all details as available within AWS can be modified here. Specify, however:

The type of server machine you want to run. t2.micro, the default, qualifies for AWS' cost-free option. Requirements vary with what you intend to do with your ScrapeBot instance.
The Amazon Machine Image (AMI) or operating system you want to run. The default is to ask the aws_default_ec2_image() function.
The user agent string, an identifying header sent with a web request to help the website identify what system it is interacting with
The browser language which nudges various websites to change their layout/language
The browser width and height, which helps to emulate also mobile devices (together with the user agent string)
The email address of the ScrapeBot user (for the web frontend) to be associated with the new instance

This function then automatically chooses the following settings:

A security group is either created or re-used, allowing incoming SSH traffic (TCP via port 22) from anywhere.
firefox is chosen as the package/browser of choice, emulated through xvfb (as suggested by ScrapeBot).

The updated AWS connection object which get an ec2_instance attached in the respective tibble (keep/store this to later terminate the instance also).

connect_aws(), aws_default_ec2_image(), get_or_create_user(), aws_terminate_instance()

## Not run: 

aws_connection <- connect_aws()
scrapebot_connection <- connect('my_db on localhost')
#t3.large:
#- not cost-free
#- 2 virtual CPUs
#- 8GiB RAM
#- up to 5GBit of down-/uplink speed
aws_launch_instance(aws_connection, scrapebot_connection, 'mario@haim.it', ec_type = 't3.large')

## End(Not run)

MarHai/ScrapeBotR documentation built on March 10, 2021, 10:10 a.m.

MarHai/ScrapeBotR index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

MarHai/ScrapeBotR
Orchestrate Instances and Retrieve Data from a ScrapeBot Database

aws_launch_instance: Launch a new ScrapeBot instance as AWS EC2 instance
In MarHai/ScrapeBotR: Orchestrate Instances and Retrieve Data from a ScrapeBot Database

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to aws_launch_instance in MarHai/ScrapeBotR...

R Package Documentation

Browse R Packages

We want your feedback!

MarHai/ScrapeBotR Orchestrate Instances and Retrieve Data from a ScrapeBot Database

aws_launch_instance: Launch a new ScrapeBot instance as AWS EC2 instance In MarHai/ScrapeBotR: Orchestrate Instances and Retrieve Data from a ScrapeBot Database

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to aws_launch_instance in MarHai/ScrapeBotR...

R Package Documentation

Browse R Packages

We want your feedback!

MarHai/ScrapeBotR
Orchestrate Instances and Retrieve Data from a ScrapeBot Database

aws_launch_instance: Launch a new ScrapeBot instance as AWS EC2 instance
In MarHai/ScrapeBotR: Orchestrate Instances and Retrieve Data from a ScrapeBot Database