knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "plots/README-"
)

The ramboseli package provides a set of small functions and utilities for R that can be shared among members of the Amboseli Baboon Research Project. There are instructions for using the package below.



Installation

To install this package, you first need to install the devtools package:

    install.packages("devtools")

Then, you can install the latest development version of ramboseli from github:

  devtools::install_github("amboseli/ramboseli")

After you have installed the package once, you can simply load it in the future using:

  library(ramboseli)




Current Functionality




Instructions for creating an SSH tunnel

Some (but not all!) functions in this package query the babase database directly. This allows you to pull "live" data from babase directly into R without going through the intermediate steps of writing queries in SQL, saving the results, and reading that data into R from static CSV files.

To use these functions, you must create an SSH tunnel to the database server. The instructions below explain how to do this.


Getting a login

First, you must have a username and password for the babase database. If you're here and you're affiliated with the ABRP, you probably have this already. To create the tunnel, you must ALSO request a login to SSH into the server, papio.biology.duke.edu. Most people do NOT have this—it must be requested from Jake.


Creating a Tunnel on Mac/Linux

The simplest way to create the tunnel on a Mac/Linux is to open a terminal window and type the following, substituting your actual username for "YourUsername":

ssh -f YourUsername@papio.biology.duke.edu -L 2222:localhost:5432 -N

Your Terminal window should then prompt you for your password for papio.biology.duke.edu. After that's entered, a message appears indicating that the connection has been made:

###############################################################################
# You are about to access a Duke University computer network that is intended #
# for authorized users only. You should have no expectation of privacy in     #
# your use of this network. Use of this network constitutes consent to        #
# monitoring, retrieval, and disclosure of any information stored within the  #
# network for any purpose including criminal prosecution.                     #
###############################################################################

Once this is done, the tunnel has been created. Keep the Terminal window open and return to R for your analysis.


Creating a Tunnel on Windows

This is a little more complicated. I don't have a Windows machine to test it on, but I think this should work. Please let me know ( camposfa@gmail.com ) if you try and can't get it to work properly.

"C:\Program Files\PuTTY\plink.exe" ssh -f YourUsername@papio.biology.duke.edu -L 2222:localhost:5432 -N -pw YourPassword

Alternatively, if you don't want to save your password in a plain text file, use the following text that will prompt you for your password each time you run the batch file.

"C:\Program Files\PuTTY\plink.exe" ssh -f YourUsername@papio.biology.duke.edu -L 2222:localhost:5432 -N


Additional Notes

There are shortcuts that can make this process more streamlined, including:

Fernando can help with this if you're interested.

Every so often, when the connection is left idle, the tunnel seems to get corrupted. Often, you can simply reconnect using the same steps as above. Sometimes reconnecting doesn't work. In those cases, you can "kill" the corrupted tunnel by typing the following into Terminal:

ps aux | grep ssh

A list of processes should appear in the window, similar to this

fac13            23128   0.0  0.0  2445836    260   ??  S    25Jun18   0:00.02 /usr/bin/ssh-agent -l
fac13            14419   0.0  0.0  2432804    776 s000  S+   12:30PM   0:00.00 grep ssh
fac13            14205   0.0  0.0  2462536    904   ??  Ss   12:27PM   0:00.00 ssh -f fac13@papio.biology.duke.edu -L 2222:localhost:5432 -N

The last process is the corrupted tunnel. Kill it by typing kill followed by the process ID number:

kill 14205




Reading data from babase

With the tunnel created, we can now connect to babase.

  library(ramboseli)
  library(tidyverse)
  library(dbplyr)

  # You will need to change user to your personal babase login AND get your password
  babase <- DBI::dbConnect(RPostgreSQL::PostgreSQL(),
                           host = "localhost",
                           port = 2222,
                           user = "fac13",
                           dbname = "babase")
  library(ramboseli)
  library(tidyverse)
  library(dbplyr)

  # You will need to change user to your personal babase login AND get your password
  # One approach to doing that is through Rstudio
  # You could also just type it in your script, but that's not recommended for security.
  babase <- DBI::dbConnect(RPostgreSQL::PostgreSQL(),
                           host = "localhost",
                           port = 2222,
                           user = "fac13",
                           dbname = "babase",
                           password = rstudioapi::askForPassword("Database password"))

Now we can create a connection to any table or view in babase. Here's an example for the biograph table.

biograph <- tbl(babase, "biograph")

You now have a live connection to the biograph table, stored as an object in R. It's important to note that this hasn't pulled in all the data, only the first few rows. Let's see what this objects looks like:

biograph

We can perform basic queries and joins on the the table using the dplyr and dbplyr packages. There's a lot more information here.

If we want to pull in all the data into our current R environment as a normal data frame, we can use the function collect.

biograph_l <- collect(biograph)
biograph_l

Need a table that in a different schema? Use the in_schema function:

all_ranks <- tbl(babase, in_schema("babase_pending", "ALL_RANKS"))
collect(all_ranks)




Best practices for using ramboseli functions

  1. Supply a database connection and no tables to obtain data directly through queries

    • Uses most up-to-date data
    • Use when carrying out exploratory analysis
    • Not reproducible
  2. Supply tables from the R environment and no database connection

    • Uses static input files (a "snapshot" of specific babase tables)
    • Reqires that user queries database first to produce static files and loads them into R environment
    • Use for final analysis if you need reproducibility


amboseli/ramboseli documentation built on March 18, 2021, 12:03 a.m.