In nickmckay/lipdR: LiPD utilities for R

options(width = 500)
knitr::opts_chunk$set(
  error=TRUE,
  collapse = TRUE,
  comment = "#>"
)

The queryLipdverse() function in lipdR allows users to download LiPD files based on various filter parameters

The filtering relies on a query table, which holds metadata for all time series available on LiPDverse

This query table is built in to, and loaded with lipdR

library(lipdR)

Let's get a sense for what the table holds

dim(queryTable)
names(queryTable)

That's a lot of rows! There's a row for every time series in the LiPDverse database

If you have used the LiPD format, you will recognize some of these names

Let's try a simple query

We'll set skip.update = FALSE for this demonstration, but it's a good idea to check for updates each time you start a new session

If you tell queryLipdverse to skip.update once, it will not prompt you again in a given R session

qt <- queryLipdverse(variable.name = c("δ18O"),
                     skip.update = TRUE)

huh, there must be more Oxygen-18 time series

Let's have a look at the unique paleoData variable.name

We'll just look at the first 250, there are quite a few

unique(queryTable$paleoData_variableName)[1:250]

There's quite a few notation styles, but they all have "18o" in common, so let's try that

qt <- queryLipdverse(variable.name = c("18o"))

Now that is a lot more data

Some query parameters will require considering what all the possible results have in common

In other cases, we can simply input a vector with several possible options

For the variable.name parameter, multiple entries are combined with "OR" logic, so more entries will generally pull more datasets

qt <- queryLipdverse(variable.name = c("δ18O", "d18O"))

This gets us almost the same result, but we see the simplified filter is still pulling one more dataset

Also note that capitalization makes no difference here

The archive.type filter works similarly, let's look at our options

unique(queryTable$archiveType)

Note that we have distinct options: "Lake", "LakeSediment", "LakeDeposits", "LakeDeposit", and "Lake Sediment"

We can grab the archive.type names and search for all of them like this

qt <- queryLipdverse(archive.type = unique(queryTable$archiveType)[c(2,3,5,6,39)])

Or we can use the simplification strategy again

qt <- queryLipdverse(archive.type = c("lake"))

As we saw last time, the results are similar

We can also filter based on publications: Author, DOI, Title, etc. using pub.info

qt <- queryLipdverse(pub.info = c("10.1016/j.quascirev.2008.09.005"))

This query can be a little slow if you don't narrow the results with another parameter first

Let's narrow our region of interest

There are four different parameters used for this: coord, country, continent, and ocean

Let's grab all the North American datasets

qt <- queryLipdverse(continent = "North America")

Let's grab just those from Mexico

qt <- queryLipdverse(country = c("Mexico"))

Note that the country and contient filters are not filtered based on LiPD content. A function in R uses the coordinates associated with the datasets to associate them with countries and the results can be unreliable near country borders

Again, we can see all of the options for country and continent with unique()

We can also use latitude and longitude directly with a bounding box

qt <- queryLipdverse(coord = c(0,90,-180,-110))

We can limit this to only marine data by setting ocean to TRUE

qt <- queryLipdverse(coord = c(0,90,-180,-110),
         ocean = TRUE)

The ocean parameter works on the same basis as the country and continent parameters and can be unreliable in coastal areas

We can also grab all the data from a compilation, such as the Western North America (WNAm)

qt <- queryLipdverse(compilation = c("wNAm"))

or multiple compilations

qt <- queryLipdverse(compilation = c("wnam", "temp12k"))

the compilation filter uses "OR" logic

Let's try pulling datasets based on their seasonality

We can pull summer, commonly defined as June, July, and August, for the Northern Hemisphere

seasonality input is taken as a list

qt <- queryLipdverse(continent = "North America",
               seasonality = list("June", "July", "August"))

Okay, so this must not be a good choice of format, let's see how LiPD authors define their seasons

unique(queryTable$interpretation1_seasonality)

It looks like numeric months are most common. Season names, "warm"/"cold", and series of first-letter abbreviations (ie. JJA) are all common.

Knowing this, let's try again

Items within a single list are treated as linked by "AND", so an input of list("June", "July", "August") would filter for season data with ALL of these months

Multiple lists are treated as linked by "OR", such that list(list("June"), list("July"), list("August)) would filter for season data with ANY of these months

Let's try to get all of the summer datasets by entering a few different notations

qt <- queryLipdverse(continent = "North America",
               seasonality = list(list("6", "7", "8"), list("summer"), list("JJA")))

This returns quite a few datasets

From our look at all the unique seasonality entries, we can see that this probably includes a lot of annual data too

Let's exclude the annual and winter datasets by using season.not

The input for for season.not works the same as seasonality

qt <- queryLipdverse(continent = "North America",
               seasonality = list(list("6", "7", "8"), list("summer"), list("JJA")),
               season.not = list(list("annual"), list("December"), list("12","1","2"), list("winter"), list("cold")))

Now we've narrowed it down to summer-specific datasets

Let's look at interpretation variables now. These are the climate variables that may serve as a target for the proxy time series available

These variables are expressed in two different formats: interpretation variable and interpretation detail

Each of these variables has four possible interpretation slots

unique(queryTable$interp_Vars)

unique(queryTable$interp_Details)

Let's look at some marine interp.vars in the northeast Pacific

qt <- queryLipdverse(coord = c(0,90,-180,-110),
               ocean = TRUE,
               interp.vars =  c("SST", "upwelling", "SSS"))

That gives us just a few datasets

Perhaps we'll have more luck with interp.details, which is more standardized

qt <- queryLipdverse(coord = c(0,90,-180,-110),
               ocean = TRUE,
               interp.details = c("sea@surface", "elNino"))

let's see if we grab more using both

These inputs combine with "OR" logic, so we may gather more datasets by using both parameters

qt <- queryLipdverse(coord = c(0,90,-180,-110),
               ocean = TRUE,
               interp.details = c("sea@surface", "elNino"),
               interp.vars =  c("SST", "upwelling", "SSS"))

looks like we get a few extra datasets with this approach

Now that we know how to use our filters, let's go for a strict filter

We'll look for marine archives in the northeast Pacific, with interpretations related to marine climate variables in the summer months only

qt <- queryLipdverse(coord = c(0,90,-180,-110),
               archive.type = c("marine", "ocean"),
               ocean = TRUE,
               interp.details = c("sea@surface", "elNino"),
               interp.vars =  c("SST", "upwelling", "SSS"),
               seasonality = list(list("6", "7", "8"), list("summer"), list("JJA")),
               season.not = list(list("annual"), list("December"), list("12","1","2"), list("winter"), list("cold")))

To fine-tune your query, set verbose to TRUE to see which parameters have what effect on filtering

We'll narrow the results further by adding author names to find within the publication info

qt <- queryLipdverse(coord = c(0,90,-180,-110),
               archive.type = c("marine", "ocean"),
               ocean = TRUE,
               interp.details = c("sea@surface", "elNino"),
               interp.vars =  c("SST", "upwelling", "SSS"),
               seasonality = list(list("6", "7", "8"), list("summer"), list("JJA")),
               season.not = list(list("annual"), list("December"), list("12","1","2"), list("winter"), list("cold")),
               pub.info = c("mix", "caissie"),
               verbose = TRUE)