README.md

Big Data with R

rstudio::conf 2020

Interested? See registration information here: RStudio Conference 2020

:spiral_calendar: January 27 and 28, 2020 :alarm_clock: 09:00 - 17:00 :hotel: [ADD ROOM] :writing_hand: RStudio Conference 2020

Overview

This 2-day workshop covers how to analyze large amounts of data in R. We will focus on scaling up our analyses using the same dplyr verbs that we use in our everyday work. We will use dplyr with data.table, databases, and Spark. We will also cover best practices on visualizing, modeling, and sharing against these data sources. Where applicable, we will review recommended connection settings, security best practices, and deployment options.

Learning objectives

In this 2-day workshop, attendees will learn how to connect to and analyze large scale data

Is this course for me?

You should take this workshop if you want to learn how to work with big data in R. This data can be in-memory, in databases (like SQL Server), or in a cluster (like Spark).

Prework

Helpful reading

Some have asked for material that would be useful to review prior to the class. The following is a compilation of subjects would be great if you are familiar with already by the time the class begins, but it is not a requirement that you study or review them.

For database background, please review the articles in the following links:

For spark background, please review the following:

Equipment

We plan to provide a personal server to each student for use during the class. The server will contain all of the applications and materials needed, including R and RStudio. All you will need is a laptop with a web browser. For those of you that need to use their work provided laptops for the class, please ensure that the web browser in it will not be prevented from navigating to Amazon AWS, which is where the servers will be set up.

Schedule

| Time | Activity | | :------------ | :--------------- | | 09:00 - 10:30 | Session 1 | | 10:30 - 11:00 | Coffee break | | 11:00 - 12:30 | Session 2 | | 12:30 - 13:30 | Lunch break | | 13:30 - 15:00 | Session 3 | | 15:00 - 15:30 | Coffee break | | 15:30 - 17:00 | Session 4 |

Instructors

Edgar Ruiz

Solutions Engineer @ RStudio

Twitter: theotheredgar

LinkedIn: edgararuiz

James Blair

Solutions Engineer @ RStudio

Twitter: Blair09M

LinkedIn: blairjm

Class Outline

The following is a tentative outline of the subjects that will be covered during the class. The content and order is subject to change.

Interested? See registration information here: RStudio Conference 2020

This work is licensed under a Creative Commons Attribution 4.0 International License.



rstudio-conf-2020/big-data documentation built on Feb. 4, 2020, 5:24 p.m.