Steph Locke | Who I am

R

R | A brief history

R | Why use it?

R | Why not Python?

Firstly, Python is a valid way to go. There are number of really good libraries out there for number crunching etc. and it is a well written language with few "quirks"

Secondly, why R is my preference:

But... "production" code can be faster in Python

R fundamentals

Basics

# Define a variable
a<-25

# Call a variable
a

# Do something to it
a+1

Data types | p1

# Numeric
25

# Character
"25"

# Logical
TRUE

Data types | p2

# Dates
as.Date("2015-08-05")
as.POSIXct("2015-08-01")

Data types | p3

# Factor
as.numeric(factor("25"))
as.character(factor("25"))

Constructs | p1

# Vector
a<-c(25, 30)

# Matrix
matrix(a)

Constructs | p2

# Data frame
data.frame(a,b=a/5,c=LETTERS[1:2])

# List
list(vector=a, matrix=matrix(a))

Subsetting | Vectors

a <- sample(1:20, size = 5, replace = TRUE) # setup
a # visual check
a[1:2] # row numbers
a[a<=10] # value filters

Subsetting | Data.frames p1

df <- data.frame(a=1:10, b = LETTERS[1:5]) # setup
df[1:2,] # row numbers
df[df$a<2,] # value filters

Subsetting | Data.frames p2

df[df$a<3,1] # column filter
df[df$a<3,1, drop=FALSE] # column filter (keep data.frame)

Functions

# Define a function
showAsPercent<-function(x) {
  paste0(round(x*100 ,0) ,"%")
}

# Call a function
showAsPercent(0.1)

Extending R

# Get a package
install.packages("caret")

# Activate a package
library(caret)

What does R look like? | OO

library("R6")
# Orig OO (s3): cyclismo.org/tutorial/R/s3Classes.html
library(R6)
Loan<-R6Class("Loan", 
              public=list(term=NA
                         ,initialize=function(term){
                           if(!missing(term)){ 
                              self$term<-term 
                              }} 
                         ,extendBy=function(ext){ 
                            self$term<-self$term+ext
                            }))

What does R look like? | OO

acc<-Loan$new(36)
acc$extendBy(6)
acc$term

Building up an R script

Commands | magrittr

magrittr allows you to pass one thing into another instead of writing lots of brackets

library(magrittr)
library(magrittr)
# Typical
pairs(iris)
pairs(tail(iris))
pairs(tail(iris,nrow(iris)/5))

# Pipe
iris %>% pairs
iris %>% tail %>% pairs
iris %>% {tail(.,nrow(.)/5)} %>% pairs

Commands | dplyr

Use dplyr to transform your datasets

library(dplyr)
library(magrittr)
library(dplyr)
iris %>% 
  filter(Petal.Width<2) %>%
  group_by(Species) %>%
  summarise_each(funs(mean))

Read data | CSV & Excel

library(readr)
OrderData<-read_csv("Order.csv")

library(readxl)
OrderData<-read_sheet("Order.xlsx","Orders")

Read data | Databases

library(RODBC)
azure <- odbcDriverConnect(
  "Driver={SQL Server Native Client 11.0};
  Server=mhknbn2kdz.database.windows.net;
  Database=AdventureWorks2012;
  Uid=sqlfamily;
  Pwd=sqlf@m1ly;")

Order    <- sqlQuery( azure, 
            "SELECT * FROM [Sales].[SalesOrderHeader]")

Write data | CSV

This is easiest and most portable option

write.csv(iris,"iris.csv", row.names = FALSE)

Script best practices

"Best" practices

Charts

library(ggplot2)
ggplot(data=iris, 
       aes(x=Sepal.Width, y=Sepal.Length, colour=Species)) + 
  geom_point() 

Documentation

Document as you go!

Interactive reports

Consider doing a shiny application that explores the data and findings

Workflow best practices

Next steps

Find out more

Online

In-person

Get this presentation

This presentation is available on github.com/stephlocke/Rtraining. All the code is available for you to take a copy and play with to help you learn on the go.

If you have any questions, contact me!

itsalocke.com | github.com/StephLocke | \@SteffLocke



stephlocke/Rtraining documentation built on May 30, 2019, 3:36 p.m.