README-NOT.md

snowball

Project Status: Abandoned – Initial development has started, but there has not yet been a stable, usable release; the project has been abandoned and the author(s) do not intend on continuing development.

An R package to do parallel processing on Amazon, (more) easily. Born 2016, at the Brisbane ROpenSci Unconference. This is a work in progress, and is currently in development.

Authors: - Dan Pagendam - Jonathan Carroll - Daniel Thomas - Zoé van Havre - Cameron Roach - Felix Leung - Suren Rathnayake

Automatically sets up and starts a cluster of AWS workers, does parallel processing, and saves the output to S3 Bucket.

# Install
devtools::install_github("ropenscilabs/snowball")

WARNING: Check yourself, before you wreck yourself! You are the ruler of your own Amazon costs.(No responsibility taken for your AWS bill...)

snowball takes the location of data, a user defined function, and some basic instructions to set up and run virtual machines in parallel on Amazon, and save results in an S3 bucket.

Requirements

Overview / workflow:

  1. Put job list and data in S3 bucket (job list is like a job roster, a data table with names of workers and functions )
  2. SpinUp all workers start monitoring S3
  3. snowball(function, bucketName, ...)
  4. snowball calls snowpack'
  5. this writes the snowpack function that will be run on each worker.

How to

1. Setup snowball

Save a .snowball file into your current working directory with the following configuration,

AWS_ACCESS_KEY_ID: \<YOURACCESSSKEYID>

AWS_SECRET_ACCESS_KEY: \<YOURSECRETACCESSKEY>

AWS_DEFAULT_REGION: \<YOURDEFAULTREGION>

Next, run snowball_setup to set global variables.

snowball_setup(config_file, echo)

2. Pack the snowball.

Start an AWS instance with buckets, while setting up the data/feature split

snowpack(fn, listItem, bucketNameString, rdsInputObjectString, rdsOutputString)

3. Throw the snowball.

Give data location and user function

throwSnowball(...)

4. Avalanche the outputs.

combine all results into one file

avalanche(...)

More help?

Snow what?

Check out the Snow and Snowfall package documentations.

What is an S3 Bucket..??

We assume you have a (very) basic understanding of what an S3 Bucket is (it's like dropbox, for data). Click here for info from Amazon.. It is very easy to create a bucket. You just click create bucket.

Setting up the 'bucket policy allowing an IAM user full access' is harder: - In the top left of an AWS window click on Services, then IAM, then click on the user you want to give access to (you, most likely). - copy the User ARN into your clipboard. - go to the newly created bucket, click on Properties - click on add policy, which opens a window called "AWS Policy Generator" - Select policy type: S3 Bucket Policy - AWS Services should be Amazon S3, - Actions: tick All Actions. - Paste your ARN into principal (I know... logical.) - Paste this (with YOUR bucket name) into the ARN box: arn:aws:s3:::bucketName - Click Add Statement, copy the contents to clipboard. Go back to bucket page, click "Edit bucket policy" and paste clipboard into this.



ropenscilabs/snowball documentation built on May 18, 2022, 8:32 p.m.