Description Usage Arguments Details Value See Also Examples
Note that this could cause costs, depending on your free tier and the chosen ec2_type. Also, note that as this function waits for a new instance to launch and reboot, it might take a couple of seconds.
1 2 3 4 5 6 7 8 9 10 11 12 13 | aws_launch_instance(
aws_connection,
instance_owner,
scrapebot_credential_section = aws_connection$rds_credential_section,
ec2_type = "t2.micro",
ec2_image = NA_character_,
ec2_image_username = "ubuntu",
browser_useragent = "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:85.0) Gecko/20100101 Firefox/85.0",
browser_language = "de-de",
browser_width = 1920,
browser_height = 1080
)
|
aws_connection |
AWS connection object, as retrieved from |
instance_owner |
The email address of the ScrapeBot user who will be the owner of the new instance as character string. If this one does not exist, it will be created (in this case, a text will be raised). |
scrapebot_credential_section |
The section within your INI file holding the credentials to the ScrapeBot central database as character string. Default is to use the one set by |
ec2_type |
AWS instance type. The default, |
ec2_image |
AWS Amazon Machine Image (AMI) to use. Default is |
ec2_image_username |
The username to log into the respective |
browser_useragent |
The emulated browser's user agent to send to requested websites. Default is a recent Firefox Desktop used under Ubuntu Linux. Will be deployed into ScrapeBot config file. |
browser_language |
Language to which emulated browser is set. Default is German German. Will be deployed into ScrapeBot config file. |
browser_width |
Width of the emulated browser in pixels. Default is a recent desktop monitor size. Will be deployed into ScrapeBot config file. |
browser_height |
Height of the emulated browser in pixels. Default is a recent desktop monitor size. Will be deployed into ScrapeBot config file. |
This function follows the suggested ScrapeBot behavior in doing six things:
Launch (i.e., create and run) one new EC2 instance in the region specified in connect_aws()
this also means to create a new security group that allows SSH traffic from anywhere
afterwards, an SSH connection is established with the new instance
Update the available package-manager repositories and install requirements (python3, firefox, xvfb, git)
Get (i.e., clone) the latest version of ScrapeBot
Provide Firefox's Geckodriver with execution rights
Install Python requirements
Setup the newly created ScrapeBot instance
register the EC2 instance as ScrapeBot instance in the central database
set up the ScrapeBot instance with the specified database and S3 settings
set up a cronjob to run the ScrapeBot every 2 minutes
As this function is aimed at minimizing efforts in setting up a ScrapeBot instance, not all details as available within AWS can be modified here. Specify, however:
The type of server machine you want to run. t2.micro
, the default, qualifies for AWS' cost-free option. Requirements vary with what you intend to do with your ScrapeBot instance.
The Amazon Machine Image (AMI) or operating system you want to run. The default is to ask the aws_default_ec2_image()
function.
The user agent string, an identifying header sent with a web request to help the website identify what system it is interacting with
The browser language which nudges various websites to change their layout/language
The browser width and height, which helps to emulate also mobile devices (together with the user agent string)
The email address of the ScrapeBot user (for the web frontend) to be associated with the new instance
This function then automatically chooses the following settings:
A security group is either created or re-used, allowing incoming SSH traffic (TCP via port 22) from anywhere.
firefox is chosen as the package/browser of choice, emulated through xvfb (as suggested by ScrapeBot).
The updated AWS connection object which get an ec2_instance attached in the respective tibble (keep/store this to later terminate the instance also).
connect_aws()
, aws_default_ec2_image()
, get_or_create_user()
, aws_terminate_instance()
1 2 3 4 5 6 7 8 9 10 11 12 | ## Not run:
aws_connection <- connect_aws()
scrapebot_connection <- connect('my_db on localhost')
#t3.large:
#- not cost-free
#- 2 virtual CPUs
#- 8GiB RAM
#- up to 5GBit of down-/uplink speed
aws_launch_instance(aws_connection, scrapebot_connection, 'mario@haim.it', ec_type = 't3.large')
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.