knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(krspfecals) library(RMySQL) library(krsp)
To connect to the fecal database instance, you need to first create a connection object which defines the location and credentiuals for the connection. The base engine for doing this is the DBI
package in R. Other packages like RMySQL
are built on top of this package and are therefore the base architectures we need. For a while, the dbplyr
package offered some source functions like dbplyr::src_dbi
to make the database connection objects easier to work with in dplyr
. This is what the original krsp
package is built with. However, these functions are either deprecated or superceded and will likely be phased out in future CRAN releases. As a result, krspfecals
is built with a slightly more fundemental architecture. I outline this in a bit more detail later on.
All functions for working with the fecal database begin with fecal
and are named using lowerCamel notation.
We can use the function fecalConnect
to create a connection object which will then be passed as a parameter into other helper functions.
con = fecalConnect(MySQL(), group = "krspfecals-aws")
I am repeating what is already laid out in the krsp
Vignettes...The best way to connect to the database is to create secure credentials on your local machine. There are a few ways to do this that are secure and repeatable. Given that most people will (should) have setup a group in their my.cnf
file, I will outline that one first.
A .cnf
file is a configuration file that MySQL reads through. They live in a few different locations depending on whether they are global configurations or user specific configurations. You can read more here (https://dev.mysql.com/doc/refman/8.0/en/option-files.html). This allows you to specify MySQL credentials permanently without embeddign them in your R code.
On Unix machines, the defualt location for this configuration file is ~/.my.cnf
and on Windows it is C:/my.cnf
(Note this is where programs will LOOK for the file. You can put it anywhere but you'll have to change the default file parameter in other programs).
To open this file (or create it if it doesn't exist) open your Terminal and type the following command
sudo nano ~/.my.cnf
This will open (or create) the file. Enter the following information
[krspfecals-aws] user = "USER" password = "PASSW0RD" host = "hormonedb.cfvasuoihvfm.ca-central-1.rds.amazonaws.com" port = 3306 database = "krspfecals"
This same structure can be used to create a security group for the krsp
database as well (see the krsp
vignettes).
When you are finished, press CTRL+X, then press Y to save. It's a good idea to then restart MySQL on your machine.
sudo systemctl restart mysql
As a note, remember that this will only work on your local machine. If you have a desktop and a laptop you'll need to configure your keys seprately on each machine.
If you do not set your credentials up in the way described above then you will have to login directly to the database. This is fine for one-off queries that will never be shared, but this is potentially very insecure. If you login directly, you will need to provide a few parameters:
NOTE: if you do not specify a host, it will automatically default to the AWS host
I want to lay out some of the underlyiung nuts and bolts of the engine so you can understand how it works and why it works a little different from the krsp
database.
The database is built with MySQL, a simple relational database management system. R has a lot of ways of connecting to and interacting with SQL databases. The fundemental engine behind most of these methods is the DBI
package. the RMySQL
package optimizes DBI
functions for working with MySQL in R. The dbplyr
package acts as a sort of middle-man package between dplyr
and DBI
. For a while, DBI
objects were not directly compatible with dplyr
and so dbplyr
implemented a src_dbi
function to force compatibility.
The original krsp
package was built on top of these src_
functions but these have since been superceded. Now, DBI
objects are easily passed to dplyr
so there is no need for these functions any more. They still exist but are no longer maintained and will likely be phased out in future CRAN releases.
Without dipping too heavily into the nitty-gritty here, the main difference is that src_dbi
converted an S4 DBI
object to an S3 object.
conn = krsp_connect(RMySQL::MySQL(), group = "krsp-aws") typeof(conn) class(conn) inherits(conn, "DBIConnection")
We see here that our krsp connection is an S3 list type and it inherits methods from that src_
family I talked about earlier.
Compare this to the result of our fecal connection
con = fecalConnect(RMySQL::MySQL(), group = "krspfecals-aws") typeof(con) class(con) inherits(con, "DBIConnection") inherits(con, "MySQLConnection")
So we see the fecal connection is of the "Super Class" DBI Connection which is inherited by the MySQLConnection type. The result is that this opens us up to use more methods on our object. While, yes, its nice to stick to methods that are clearly defined by the package, for those used to working with databases this will avoid headaches caused by inheritance errors. Here's an example.
#We can use the built in functions to check which tables are in our krsp database krsp_tables(conn) #But if we wanted to use one of the standard functions to do the same thing... DBI::dbListTables(conn) RMySQL::dbListTables(conn) dplyr::db_list_tables(conn) #only one that works dplyr::src_tbls(conn) #This method is ONLY defined for the src_ class which is no longer maintained ?dbplyr::src_dbi #You can run this and see it is superceded
So, in order to ensure fitire compatibility with the base engines, krspfecals
is built as an extension of the existing packages, rather than as a wrapper around them. This has a few side effects:
DBI
and RMySQL
maintain their base classes we should be okay.krsp
functions will NOT work on fecal connection objects and vice-versa. Yes, fewer people will use the fecal database package than the krsp package so the cross-compatibility is a minor issue, but it would be nice to have cross-compatibility. This will have MINOR effects for day to day use, but if you get sloppy and start mixing methods you'll run into unexpected troubles. More features will be added to the package to include handy functions for cleaning and summarizing data. I also plan to add some handy helper functions that I use a lot with the krsp
package (like a function to automatically calculate local densities and functions to automatically generate summary data).
For now, the data are structures such that there is one main pipeline that can be modified ad libitum to suit your needs.
We can easily see which data are currently housed in the database. I'll also demonstrate the benefit of the low-level architecture here.
#Use a built in method fecalTables(con) #Use any other method DBI::dbListTables(con) RMySQL::dbListTables(con) dplyr::db_list_tables(con)
If you're curious why this third one doesn't work on S3 or S4, this thread points out that Hadley Wickham suggested it would be deprecated soon (in 2017) and encourages using dbListTables
.
I am currently in the process of documenting all the tables with their data types and explnations. You can see what documentation is curently available using the fecalTableDoc()
function:
This pulls straight from a JSON file so the formatting might be weird...But you'll figure it out.
fecalTableDoc(con) #If you forget a table fecalTableDoc(con, "poop_link")
In general, you should start with your necessary trapping records, link them through the poop_link table, then join to extracts, then to the necessary assays.
Here is the pipeline that captures all the data.
trapping = tbl(conn, "trapping") %>% #filter() %>% select(trapping_id = id, everything()) %>%#change from everything if you need less columns collect() # We must collect it into memory since the krsp and fecal databases are at different endpoints poop_link = tbl(con, "poop_link") %>% collect() # Assume we are only interested in individuals we have cort data for extracts = tbl(con, "extracts") cort = tbl(con, "cort") assays = inner_join(cort, extracts) %>% collect() joined = trapping %>% inner_join(poop_link) %>% inner_join(assays)
This table is then a collection of the trapping data and hormone data for every individual for whom we have hormone data. I use inner joins for simplicity, But you can use a left_join
to keep individuals who don't have hormone data.
Let's say we want to look at cort levels over pregnancy for females on Kloo and Sulpher from 2010 onwards. I've chosen an example that also uses the "litters" table for sake of demonstration. Well start with our primary tables. Some simple data wrangling.
library(lubridate) litters = tbl(conn, "litter") %>% filter(yr >= 2010, grid %in% c("KL", "SU"), ln == 1) %>% select(squirrel_id, fieldBDate) trapping = tbl(conn, "trapping") %>% filter(gr %in% c("KL", "SU"), sex == "F", year(date) >= 2010) %>% select(trapping_id = id, squirrel_id, date) %>% left_join(litters) %>% collect() %>% mutate(across(contains("date"), yday))
We'll then calculate relative parturition and ust keep females who are pregnant
trapping = trapping %>% mutate(rel_part = date-fieldBDate) %>% #pregnant are negative, lacatating are positive select(-c(contains("date"))) %>% #pregnancy is from -35 -- 0 filter(rel_part >= -35 & rel_part<= 0)
#Note this section will rarely ever change. Copy and poasted from above poop_link = tbl(con, "poop_link") %>% collect() # Assume we are only interested in individuals we have cort data for extracts = tbl(con, "extracts") cort = tbl(con, "cort") assays = inner_join(cort, extracts) %>% collect() joined = trapping %>% inner_join(poop_link) %>% inner_join(assays)
The variable final_concentration
refers to the steroid concentration in the well of the assay plate. We need to convert this to a concentration of steroid present in the feces.
$$ steroid(ng)/feces(g) = \frac{\frac{pg}{well} * ExtractVolume(\mu L) * DilutionFactor}{FecalWeight(g) * SampleVolume(\mu L) * 1000} $$ Where
$\frac{pg}{well}$ is the output concentration from the plate reader (cort$final_concentration
)
$ExtractVolume$ is the volume of 80% MeOH used for the extraction in \mu L + fecal weight in mg (500\mu L + extracts$mass_extracted_g * 1000)
$DilutionFactor$ is the dilution of pre-EIA (e.g. 10 if the extracts were diluted 1:10) (cort$dilution)
$FecalWeight$ is the weight of feces extracted (expressed in g) (extracts$mass_extracted_g)
$SampleVolume$ is the volume transferred to EIA (e.g 10 \mu L) (cort$volume_used)
Extracts are always extracted in 500 \mu L of 80% MeOH
I have added a helper funciton to help with this, but be aware it is clunky and will only work on a joined table of extracts and hormone that is unaltered. This is mildly intentional.
So, to calculate your hormone concetration in ng/g
joined$cort_ng_g = fecalHormoneCalc(joined)
Now we can plot this.
library(ggplot2) ggplot(joined, aes(x = rel_part, y = cort_ng_g)) + geom_point(size = 0.5, alpha = 0.5) + scale_y_log10() + geom_smooth(aes(group = squirrel_id), method = "glm", se = F, size = 0.3, color = "darkgrey") + geom_smooth(method = "glm", se = F) + theme_minimal()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.