In brouwern/dayoff: Bioinformatics in R

knitr::opts_chunk$set(echo = TRUE)

SAVE THIS FILE TO A SAFE PLACE WHERE YOU CAN RELOCATE IT

In this exercise we will rebuild a data table and figure in a paper by Kimura et al. "Growth temperatures of archaeal communities can be estimated from the guanine‐plus‐cytosine contents of 16 S rRNA gene fragments". The paper can be downloaded from https://doi.org/10.1111/1758-2229.12035 . Not that the article is paywalled so that if you are not on campus when you access it you will have to first log in to my.pitt.edu and then access the paper via the library.

Preliminaries

This data is found in the dayhoff package. The only other package you need it ggpubr for plotting.

library(ggpubr)

Step one: Make each column

Read the data off of the original Table 1 and assign them to the objects put in place below. Delete the NAs; these as placeholders for you to put your code. However, do leave the NAs for "accession" and "reference"; this will create a column of NAs that will servve as a placeholder (in the future I'll probably type these up myself but haven't yet).

# strain initials (eg, "Methanocalculus pumilus" = "MP")
strain    <- NA

# Growth temperatures
T.min     <- NA
T.opt     <- NA
T.max     <- NA

# Accession number; leave this code as is
## You don't have to put anything here
## this will help the table have the same
## number of columns as the original
accession <- NA

# Length of 16sRNA gene in BP
length.bp <- NA

# Percent GC
P.GC      <- NA

# Melting temperature
T.m       <- NA

# References - leave this as is
reference <- NA
## You don't have to put anything here
## (same situation as accession)

Quality control

In the chunk below, type a command to check the size of one of the objects you just made.

In the chunk below, type a command to check the type of data structure of the object you just made.

Basic Data exploration

What is the mean T.opt? Replace the NA and type the appropriate code below .

mean.T.opt <- NA

What is the maximum P.GC value?

max.P.GC <- NA

Part 2: Build the dataframe

Now, re-make Table 1 of Kimura et al (2013).

Type in the code necessary to build that dataframe; assign it to an object called "kimura"

# Delete the "NA" and add the appropirate code
kimura  <- NA

Using the dataframe as the source of the data, type a command which tells you the mean of the T.min values

Part 3: Make figure

Re-make Figure 2 of Kimura et al (2013). Include the following elements

Data points from Table 1
Regression line (line of best fit)
Correlation coefficient (will show up as "R"; Plot 2 in the paper actually has R^2; don't wory about the diffrence in values)
P-value (created by same arguement as the correlation coefficient)

The original Figure 2 has the sample size listed and some small error bars around the data points. Don't worry about this.

NOTE: the plot you can make with ggpubr will have slightly differnet p-values and correlation coefficient values. No worries. The goal is to get the general theme of the plot.

In addition to previous exercises you may with to consult this site: http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/78-perfect-scatter-plots-with-correlation-and-marginal-histograms/

#Note: making plots doesn't involve the assignment operator

Now add teh arguement size = "length.bp" so that the size of the points is scaled to that column.

#Note: making plots doesn't involve the assignment operator

Graph of length.bp vs T.m

Create a graph that shows the relationship between T.m and length.bp

Create new variable: T.m/length.bp

Divide T.m by by length.bp; assign this to a column in the table called T.m.per.bp

Short answer question: Why am I having you do this? (2-4 sentences)

What does dividing T.m by length.bp accomplish? (consider what the plot of length.bp vs T.m tells you about the inherent relationship between length and T.m and how that might confuse the relationship between T.m and P.GC )

Assign the mean length.bp to an object called mean.length.bp

mean.length.bp <- NA

Multiply the column T.m.per.bp by the value mean.length.bp. Assign it to a column in the dataframe called T.m.per.mean.bp

Short answer question: Why am I having you do this? (2-4 sentences)

What does multiplying T.m.per.bp by the mean length accomplish?

New plot: T.m801 versus P.GC

Create a graph like Figure 2 from the original paper but using T.m.per.mean.bp.

Short answer question: What has changed with this new plot? Why? (3-5 sentences)

Explain what is different about this plot of T.m801 versus P.GC from the origional one of T.m. versus P.GC

SAVE THIS FILE TO A SAFE PLACE WHERE YOU CAN RELOCATE IT

brouwern/dayoff documentation built on Nov. 4, 2019, 8:15 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

brouwern/dayoff
Bioinformatics in R

In brouwern/dayoff: Bioinformatics in R

Preliminaries

Step one: Make each column

Quality control

Basic Data exploration

Part 2: Build the dataframe

Part 3: Make figure

Graph of length.bp vs T.m

Create new variable: T.m/length.bp

Short answer question: Why am I having you do this? (2-4 sentences)

Short answer question: Why am I having you do this? (2-4 sentences)

New plot: T.m801 versus P.GC

Short answer question: What has changed with this new plot? Why? (3-5 sentences)

R Package Documentation

Browse R Packages

We want your feedback!

brouwern/dayoff Bioinformatics in R

In brouwern/dayoff: Bioinformatics in R

Preliminaries

Step one: Make each column

Quality control

Basic Data exploration

Part 2: Build the dataframe

Part 3: Make figure

Graph of length.bp vs T.m

Create new variable: T.m/length.bp

Short answer question: Why am I having you do this? (2-4 sentences)

Short answer question: Why am I having you do this? (2-4 sentences)

New plot: T.m801 versus P.GC

Short answer question: What has changed with this new plot? Why? (3-5 sentences)

R Package Documentation

Browse R Packages

We want your feedback!

brouwern/dayoff
Bioinformatics in R