Firstly, Python is a valid way to go. There are number of really good libraries out there for number crunching etc. and it is a well written language with few "quirks"
Secondly, why R is my preference:
But... "production" code can be faster in Python
# Define a variable a<-25 # Call a variable a # Do something to it a+1
# Numeric 25 # Character "25" # Logical TRUE
# Dates as.Date("2015-08-05") as.POSIXct("2015-08-01")
# Factor as.numeric(factor("25")) as.character(factor("25"))
# Vector a<-c(25, 30) # Matrix matrix(a)
# Data frame data.frame(a,b=a/5,c=LETTERS[1:2]) # List list(vector=a, matrix=matrix(a))
a <- sample(1:20, size = 5, replace = TRUE) # setup a # visual check a[1:2] # row numbers a[a<=10] # value filters
df <- data.frame(a=1:10, b = LETTERS[1:5]) # setup df[1:2,] # row numbers df[df$a<2,] # value filters
df[df$a<3,1] # column filter df[df$a<3,1, drop=FALSE] # column filter (keep data.frame)
# Define a function showAsPercent<-function(x) { paste0(round(x*100 ,0) ,"%") } # Call a function showAsPercent(0.1)
# Get a package install.packages("caret") # Activate a package library(caret)
library("R6")
# Orig OO (s3): cyclismo.org/tutorial/R/s3Classes.html library(R6) Loan<-R6Class("Loan", public=list(term=NA ,initialize=function(term){ if(!missing(term)){ self$term<-term }} ,extendBy=function(ext){ self$term<-self$term+ext }))
acc<-Loan$new(36) acc$extendBy(6) acc$term
magrittr allows you to pass one thing into another instead of writing lots of brackets
library(magrittr)
library(magrittr) # Typical pairs(iris) pairs(tail(iris)) pairs(tail(iris,nrow(iris)/5)) # Pipe iris %>% pairs iris %>% tail %>% pairs iris %>% {tail(.,nrow(.)/5)} %>% pairs
Use dplyr
to transform your datasets
library(dplyr) library(magrittr)
library(dplyr) iris %>% filter(Petal.Width<2) %>% group_by(Species) %>% summarise_each(funs(mean))
library(readr) OrderData<-read_csv("Order.csv") library(readxl) OrderData<-read_sheet("Order.xlsx","Orders")
library(RODBC) azure <- odbcDriverConnect( "Driver={SQL Server Native Client 11.0}; Server=mhknbn2kdz.database.windows.net; Database=AdventureWorks2012; Uid=sqlfamily; Pwd=sqlf@m1ly;") Order <- sqlQuery( azure, "SELECT * FROM [Sales].[SalesOrderHeader]")
This is easiest and most portable option
write.csv(iris,"iris.csv", row.names = FALSE)
library()
at the top of the script# ---- SectionName ----
to allow you to pick up the code into a LaTeX or markdown doc laterdata.table
over dplyr
library(ggplot2) ggplot(data=iris, aes(x=Sepal.Width, y=Sepal.Length, colour=Species)) + geom_point()
Document as you go!
# ---- SectionName ----
to save repetitionConsider doing a shiny
application that explores the data and findings
assertive
, assertthat
, testthat
)ggplot2
)This presentation is available on github.com/stephlocke/Rtraining. All the code is available for you to take a copy and play with to help you learn on the go.
If you have any questions, contact me!
itsalocke.com | github.com/StephLocke | \@SteffLocke
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.