knitr::opts_chunk$set(echo = TRUE) library(learnr) tutorial_options(exercise.timelimit = 10, exercise.blanks = "___+")
Reproducible data analysis:
Document and share exactly how you analyzed your data
Do more with your analysis, more efficiently:
More control and flexibility
Use community-created analysis tools
Leverage cancer data resources (often large datasets)
Create awesome visualizations
Get everyone past 'valley of despair' in R learning curve
Convince you that R is an accessible and useful tool for you in your research
Prepare you to tackle BootCamp projects next week
Get you excited to keep developing these coding skills!
::: {style="font-size: 80%; text-align: left;"}
Upsides
Free
Great for data analysis and visualization
LOTS of bioinformatics/stats tools available
Downsides?
It's a hodge-podge
Not the best for engineering software
:::
[LIVE DEMO]
What this course is
Coding basics
Heavy emphasis on practical skills (data wrangling, visualization)
Flagging areas with technical depth but giving the ‘need-to-know'
What this course is not
Intro to computer science
Intro to stats
Presentations: lecture style: present key concepts
Practice workbooks: Hands-on practice with small groups
Weekend homework assignments: 1 each weekend
Resources
TAs: (James, Ashir) plus: Jason Kwon, Aviad Tsherniak, Phil Montgomery, William Colgan
Course website: broad.io/cp_r_bootcamp
Slack #cancer-bootcamp_r
Chester Ismay's DataCamp slides
HBC Intro to R course
Sam Meier's 2018 R lectures
Hadley Wickham's R for Data Science
Where the action happens!
Provide inputs to R
See outputs of commands you give it (each on separate line)
Web-based R consoles (Rstudio coming soon!)
List of key math operations
*
: multiplication/
: division+
: addition-
: subtraction^
: 'raise to the power'==
Check equals!=
Check not equals>
Greater than, <
less than>=
Greater than or equal to, etc.2^2 == 3
Variables:
Store information/data (the "nouns")
Come in a number of flavors
Functions:
Set of instructions to perform some task (the "verbs")
<-
x <- 3 * 4 x
Numbers
x <- 1 x <- 1.592E-39
Strings (text)
x <- 'abc' x <- "abc"
Logical (true/false)
x <- TRUE x <- FALSE x <- T
Factors (categorical variables)
e.g. ('bad', 'OK', 'good', 'great')
We will mostly try to avoid these, but be aware of them.
_
', and '.
'. Not allowed: 4th
, my var
, weird?
, etc. etc.
Object naming is important for writing good, readable, code
Make variable names descriptive
avgClicks
calculate_avg_clicks
NOT: var1
or a
Vectors
Lists
Matrix
Dataframes
Ordered collection of values. Like a sequence of 'buckets'
Can hold numeric data
Use 'combine' function c()
num_vec <- c(1, 2, 3, 4) log_vec <- c(TRUE, TRUE, FALSE, F) str_vec <- c('this', 'is', 'a', 'vector', 'of', 'strings') print(num_vec)
Shorthand to create a sequence of integers
1:4
Quick notes on missing values in R (will be important)
NA
('not available') is a special value for missing data that can be included in any type of vector
c(1, 2, NA) c('a', 'b', NA)
c()
can also be used to add new elements to a vector
string_vec <- c("TP53", "PLEC", "DSPP", "PIK3CA") string_vec2 <- c(string_vec, "BRAF") string_vec2
Combining two vectors
string_vec <- c("TP53", "PLEC", "DSPP", "PIK3CA") string_vec2 <- c("BRAF", "EGFR", "DUSP4") c(string_vec, string_vec2)
Lists are basically like relaxed vectors, where elements don't have to be the same type
z <- list('a', 1, TRUE) z
You can combine lists with the c()
function as with vectors
z <- list('a', 1, TRUE) c(z, 'c')
Most common way of interacting with data
Each column is a vector, and they can hold different kinds of data
Like an Excel table.
Create variables with <-
, naming them is important
Variables can be numbers, text (strings) or TRUE/FALSE (boolean)
Data organized as vectors, lists, matrices, and dataframes
Create/add to vectors (or lists) with c()
Make lists with list()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.