library(learnr)
library(tidyverse)
knitr::opts_chunk$set(echo = T, message=F, error=F)
tutorial_options(exercise.timelimit = 60, exercise.blanks = "___+")

Setting up an Rstudio project

Creating a project folder

For the first part of this tutorial we're going to walk through setting up a new Rstudio project on your computer.

To start, we first want to create a project folder.

Setting up the Rstudio project

Now that our project folder is set up, let's create an Rstudio project

Set up a new R notebook

Now that we've set up our project folder and have created an Rstudio project, let's try creating an R markdown notebook. Recall that there are two main ways people write R code. One is an "R script" (.R file) which is basically just a text file with R code. The other is "Rmarkdown" (.Rmd file), which is more like a 'lab notebook' for coding. We highly recommend you get comfortable using Rmarkdown notebooks to write R code first, as it encourages better practices. It's also generally a great way to write one-off analysis scripts that you can share with colleagues (including future you).

Let's create a new Rmarkdown file. You can read more about Rmarkdown here

Practice editing the Markdown doc

More on how to make nicely formatted text using markdown here

Practice loading and saving data

For the following practice, you'll want to follow along in your newly created Rmarkdown doc (rather than by running the code directly on this website), as it's better to practice data loading and saving as it happens 'in the field'.

You can leave the default template stuff in your doc, or start fresh (leave the stuff above "##R Markdown").

For each piece, create a new code chunk. It's best to test running each code chunk as you go, using the green arrow in the top-right corner of the code chunk. Only worry about 'knitting' the doc once you think you're done editing it.

Loading packages

It's best to load packages at the beginning of your Markdown doc (make it your first chunk).

First let's load the tidyverse package. We're also going to load a few additional packages. You may need to install here and useful before you can load them. e.g. running install.packages('here'). data.table should be pre-installed.

library(tidyverse)
library(useful)
library(here)
library(data.table)

Now try loading the following datasets. Recall that you can use the here function to specify the location of a file in your project's data folder like this: here('data', 'my_file')

mouse_exp_design.csv

This is a table with metadata from an experiment. Load it and assign it to a variable named sample_info. Then try using tools like View, head, and glimpse to inspect the table.

mouse_exp_design2.csv

Now try loading the file mouse_exp_design2.csv (another metadata table), and inspect it. What's different? What's wrong? (Hint look at the data type of each column). No need to fix this yet, just showing the little 'gotchas' that can arise when loading data.

mouse_exp_design.xlsx

mouse_exp_design.xlsx (same table as above but stored as an Excel file). Use the function read_excel. (How does the result differ from when you loaded the same data in .csv format?). Note: there's only one sheet in this .xlsx, so you don't have to worry about specifying which sheet to load here.

normalized_counts2.txt

Now lets load a matrix of data: normalized_counts2.txt (normalized read counts from an RNA-seq experiment, with genes as rows and samples as columns). Use the approach described in the lesson to load this file and convert it into a matrix with gene names as the rownames.

Recall these steps: - you'll want to use the fread function from the data.table package to load the file. - Convert it to a tibble with as_tibble - Make a column into the rownames using column_to_rownames - Make it into a matrix using as.matrix

Once you're down take a look at a piece of it using the corner function from the useful package.

You should see something like this:

counts_mat <- fread(here('data', 'normalized_counts2.txt')) %>% 
  as_tibble() %>% 
  column_to_rownames('Gene') %>% 
  as.matrix()
corner(counts_mat)

Saving data

Generating the report

Now try knitting your document and check out the result!

Vector manipulation

Let us explore some vector manipulation. For the rest of this practice we're going to use the website rather than working in Rstudio locally.

alphabet

alphabet <- c('C', 'D', 'X', 'L', 'F')

Use the associated positional indices along with [ ] to extract the following elements from the vector alphabet:


alphabet[1]
alphabet[c(1,2)]
alphabet[c(1,2,5)]

alphabet[c(5,2,1)]

alphabet[-3]

nums

nums <- 5:21

Use positional indices, and logical indices, to extract the following from the vector nums


nums[1:5]

nums[1:length(nums)%%2==0]

ages

ages <- c(Greg = 30, Alice = 15, Bob = 22, Fran = 18, Dan = 52, Charlie = 45)

Use the named vector ages below to do the following


ages[c('Bob', 'Fran')]
ages_ordered <- ___
ages_ordered <- ages[sort(names(ages))]

Matrix indexing

Let's load an RNA-Seq expression matrix (genes as rows and samples as columns, with each value representing the normalized expression level). We'll also load a table of sample information.

counts_mat <- fread(here::here('data','counts_rpkm.csv'))
counts_mat <- as_tibble(counts_mat)
counts_mat <- as.matrix(column_to_rownames(counts_mat, var = 'Gene'))
metadata <- read_csv(here::here('data','mouse_exp_design.csv'))

Inspect counts_mat using corner(), and dim(), and check out metadata using glimpse()


Use the following list of important genes to extract the counts data for these genes (subset the matrix counts_mat to only the rows corresponding to these genes)

 important_genes <- c("ENSMUSG00000083700", "ENSMUSG00000080990", "ENSMUSG00000065619", "ENSMUSG00000047945", "ENSMUSG00000081010",     "ENSMUSG00000030970")
counts_mat[important_genes,]

Use the %in% function, along with all() to verify that all samples in the counts_mat dataset (you can get the list of samples with: colnames(counts_mat)) have entries in the metadata table


all(metadata$sample %in% colnames(counts_mat))

Reorder the columns of counts_mat to be in the same order as samples appear in the metadata table


counts_mat[,metadata$sample]

BONUS: Now use the t.test function to test whether the mean expression of gene 'ENSMUSG00000081010' is different in typeA samples compared to typeB samples


#If you reorder the samples in counts_mat to align with metadata, you can access data for the typeA samples like this
counts_mat[, metadata$celltype == 'typeA']
counts_mat <- counts_mat[,metadata$sample]
typeA_exp <- counts_mat['ENSMUSG00000081010', metadata$celltype=='typeA']
typeB_exp <- counts_mat['ENSMUSG00000081010', metadata$celltype=='typeB']

t.test(typeA_exp, typeB_exp)

List manipulation

Use the list people to do the following:

people <- list(
  Allice = list(age = 20, height = 50, school = 'MIT'),
  Bob = list(age = 10, height = 30, school = 'Harvard'),
  Charlie = list(age = 40, height = 60, school = 'BU'),
  Frank = list(age = 10, height = 2)
  )

Extract the data corresponding to Charlie


people$Charlie

Pull out Charlie's school and assign it to a variable charlie_school


charlie_school <- people$Charlie$school

What happens if you run people$frank? Why?


## Gives an error as R is case sensitive

BONUS: Create a named list that specifies for each of your academic degrees, the year and place you got it.


Dataframe basics

Create a tibble that describes the last few places you've lived in (3 is enough). You could include the city, state, and first year there, for example


Here's a table of information about different cars

data("mtcars")
data("mtcars")
head(mtcars)

Subset the table to only the cyl, hp, and wt columns

mtcars_sub <- mtcars[___]
head(mtcars_sub)
mtcars_sub <- mtcars[,c('cyl', 'hp', 'wt')]
head(mtcars_sub)


AshirBorah/cp_bootcamp_r_tutorials documentation built on May 16, 2024, 3:24 p.m.