Super R

data.table

data.table

  1. Super-fast slicing & dicing
  2. Low memory footprint vs data.frames
  3. Fast joins
  4. Auto-indexing
  5. Many saved characters!
  6. Active dev

data.table

Task | How ------------- | ------------- Read CSV | irisDT <- fread(“iris.csv”) Return everything | irisDT irisDT[ ] Select columns | irisDT[ , .(Sepal.Length, Sepal.Width) ] Restrict rows | irisDT[ Sepal.Length >=5 , ] Aggregate | irisDT[ , mean(Sepal.Length)] Aggregate by group | irisDT[ , mean(Sepal.Length) , Species ] Count | irisDT[ , .N ]

The "Hadleyverse"

Hadley Wickham, insanely prolific developer of R packages has produced a great ecosystem:

dplyr

dplyr

  1. Relatively clear verbs
  2. Quite easy to get started with
  3. Verbose

ggplot2

ggplot2

  1. Clean conceptual implementation
  2. Highly customisable
  3. Simple to start with
  4. Big ecosystem

ggplot2

Term | Explanation | Example(s) ------------- | ------------- | ------------- plot | A plot using the grammar of graphics | ggplot() aesthetics | attributes of the chart | colour, x, y mapping | relating a column in your data to an aesthetic statistical transformation | a translation of the raw data into a refined summary | stat_density() geometry | the display of aesthetics | geom_line(), geom_bar() scale | the range of values | axes, legends coordinate system| how geometries get laid out | coord_flip() facet | a means of subsetting the chart | facet_grid() theme | display properties | theme_minimal()

caret

A single point of contact for myriad statistical & machine learning algorithms

AzureML

Interact with AzureML in R

  1. No GUI
  2. All the development environments of R
  3. Consume AzureML webservice

rmarkdown

Write markdown, interweave code

  1. Data provenance
  2. Ease of documenting & developing solutions
  3. Extendible & customisable

miniCRAN

Make your own CRAN

  1. R interface
  2. Control over packages
  3. Internal package deployment

Next steps

  1. Play with the popular packages
  2. Look at CRAN task views
  3. Ask for recommendations!


stephlocke/Rtraining documentation built on May 30, 2019, 3:36 p.m.