knitr::opts_chunk$set(echo = TRUE)
SAVE THIS FILE TO A SAFE PLACE WHERE YOU CAN RELOCATE IT
In this exercise we will rebuild a data table and figure in a paper by Kimura et al. "Growth temperatures of archaeal communities can be estimated from the guanine‐plus‐cytosine contents of 16 S rRNA gene fragments". The paper can be downloaded from https://doi.org/10.1111/1758-2229.12035 . Not that the article is paywalled so that if you are not on campus when you access it you will have to first log in to my.pitt.edu and then access the paper via the library.
This data is found in the dayhoff package. The only other package you need it ggpubr for plotting.
library(ggpubr)
Read the data off of the original Table 1 and assign them to the objects put in place below. Delete the NAs; these as placeholders for you to put your code. However, do leave the NAs for "accession" and "reference"; this will create a column of NAs that will servve as a placeholder (in the future I'll probably type these up myself but haven't yet).
# strain initials (eg, "Methanocalculus pumilus" = "MP") strain <- NA # Growth temperatures T.min <- NA T.opt <- NA T.max <- NA # Accession number; leave this code as is ## You don't have to put anything here ## this will help the table have the same ## number of columns as the original accession <- NA # Length of 16sRNA gene in BP length.bp <- NA # Percent GC P.GC <- NA # Melting temperature T.m <- NA # References - leave this as is reference <- NA ## You don't have to put anything here ## (same situation as accession)
In the chunk below, type a command to check the size of one of the objects you just made.
In the chunk below, type a command to check the type of data structure of the object you just made.
What is the mean T.opt? Replace the NA and type the appropriate code below .
mean.T.opt <- NA
What is the maximum P.GC value?
max.P.GC <- NA
Now, re-make Table 1 of Kimura et al (2013).
Type in the code necessary to build that dataframe; assign it to an object called "kimura"
# Delete the "NA" and add the appropirate code kimura <- NA
Using the dataframe as the source of the data, type a command which tells you the mean of the T.min values
Re-make Figure 2 of Kimura et al (2013). Include the following elements
The original Figure 2 has the sample size listed and some small error bars around the data points. Don't worry about this.
NOTE: the plot you can make with ggpubr will have slightly differnet p-values and correlation coefficient values. No worries. The goal is to get the general theme of the plot.
In addition to previous exercises you may with to consult this site: http://www.sthda.com/english/articles/24-ggpubr-publication-ready-plots/78-perfect-scatter-plots-with-correlation-and-marginal-histograms/
#Note: making plots doesn't involve the assignment operator
Now add teh arguement size = "length.bp" so that the size of the points is scaled to that column.
#Note: making plots doesn't involve the assignment operator
Create a graph that shows the relationship between T.m and length.bp
Divide T.m by by length.bp; assign this to a column in the table called T.m.per.bp
What does dividing T.m by length.bp accomplish? (consider what the plot of length.bp vs T.m tells you about the inherent relationship between length and T.m and how that might confuse the relationship between T.m and P.GC )
Assign the mean length.bp to an object called mean.length.bp
mean.length.bp <- NA
Multiply the column T.m.per.bp by the value mean.length.bp. Assign it to a column in the dataframe called T.m.per.mean.bp
What does multiplying T.m.per.bp by the mean length accomplish?
Create a graph like Figure 2 from the original paper but using T.m.per.mean.bp.
Explain what is different about this plot of T.m801 versus P.GC from the origional one of T.m. versus P.GC
SAVE THIS FILE TO A SAFE PLACE WHERE YOU CAN RELOCATE IT
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.