Converting Occurrences Data to a timeList Data Object

Description

This function converts occurrence data, given as a list where each element is a different taxon's occurrence table (containing minimum and maximum ages for each occurrence), to the 'timeList' format, consisting of a list composed of a matrix of lower and upper age bounds for intervals, and a second matrix recording the interval in which taxa first and last occur in the given dataset.

Usage

1
occData2timeList(occList, intervalType = "dateRange")

Arguments

occList

A list where every element is a table of occurrence data for a different taxon, such as that returned by taxonSortPBDBocc. The occurrence data can be either a two-column matrix composed of the lower and upper age bounds on each taxon occurrence, or has two named variables which match any of the field names given by the PBDB API under either the 'pbdb' vocab or 'com' (compact) vocab for early and late age bounds.

intervalType

Must be either "dateRange" (the default), "occRange" or "zoneOverlap". Please see details below.

Details

This function should translate taxon-sorted occurrence data, which could be Paleobiology Database datasets sorted by taxonSortPBDBocc or any data object where occurrence data (i.e. age bounds for each occurrence) for different taxa is separated into different elements of a named list.

The argument intervalType

The argument intervalType controls the algorithm used for obtain first and last interval bounds for each taxon, of which there are several to select from:intervalType

"dateRange"

The default option. The bounds on the first appearances are the span between the oldest upper and lower bounds of the occurrences, and the bounds on the last appearances are the span between the youngest upper and lower bounds across all occurrences. This is guaranteed to provide the smallest bounds on the first and last appearances, and was originally suggested to the author by J. Marcot.

"occRange"

This option returns the smallest bounds among (a) the oldest occurrences for the first appearance (i.e. all occurrences with their lowest bound at the oldest lower age bound), and (b) the youngest occurrences for the last appearance (i.e. all occurrences with their uppermost bound at the youngest upper age bound).

"zoneOverlap"

This option is an attempt to mimic the stratigraphic range algorithm used by PBDB Classic which "finds the oldest base that is older than at least part of all the intervals and the youngest that is younger than at least part of all the intervals" (pers.comm., J. Alroy). This is a somewhat more complex case as we are trying to obtain a timeList object. So, for calculating the bounds of the first interval a taxon occurs in, the zoneOverlap algorithm looks for all occurrences that overlap with the age range of the earliest-most occurrence and (1) obtains their earliest boundary ages and returns the latest-most earliest age boundary among these overlapping occurrences and (2) obtains their latest boundary ages and returns the earliest-most latest age boundary among these overlapping occurrences. Similarly, for calculating the bound of the last interval a taxon occurs in, the zoneOverlap algorithm looks for all occurrences that overlap with the age range of the latest-most occurrence and (1) obtains their earliest boundary ages and returns the latest-most earliest age boundary among these overlapping occurrences and (2) obtains their latest boundary ages and returns the earliest-most latest age boundary among these overlapping occurrences.

On theoretical grounds, one could probably describe the zone-of-overlap algorithm as minimizing taxonomic age ranges by assuming that all overlapping occurrences at the start and end of a taxon's range probably describe a very similar first and last appearance (FADs and LADs), and thus picks the occurrence with bounds that extends the taxonomic range the least. However, this does come with a downside that if these occurrences are not essentially repeated attempts to capture the same FAD or LAD, then the zone-of-overlap algorithm isn't an accurate depiction of the uncertainty in the ages. The true biological range of a taxon might be well outside the bounds obtained using the zone-of-overlap algorithm. A more conservative approach is the "dateRange" algorithm which finds the smallest possible bounds on the endpoints of a taxon's range without ignoring uncertainty from any particular set of occurrences.

Value

Returns a standard timeList data object, as used by many other paleotree functions, like bin_timePaleoPhy, bin_cal3TimePaleoPhy and taxicDivDisc

Author(s)

David W. Bapst, with the 'dateRange' algorithm suggested by Jon Marcot.

See Also

taxonSortPBDBocc, plotOccData and the example graptolite dataset at graptPBDB

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
data(graptPBDB)

graptOccSpecies<-taxonSortPBDBocc(graptOccPBDB,rank="species",onlyFormal=FALSE)
graptTimeSpecies<-occData2timeList(occList=graptOccSpecies)

head(graptTimeSpecies[[1]])
head(graptTimeSpecies[[2]])

graptOccGenus<-taxonSortPBDBocc(graptOccPBDB,rank="genus",onlyFormal=FALSE)
graptTimeGenus<-occData2timeList(occList=graptOccGenus)

layout(1:2)
taxicDivDisc(graptTimeSpecies)
taxicDivDisc(graptTimeGenus)

# the default interval calculation is "dateRange"
# let's compare to the other option, "occRange"
	# for species

graptOccRange<-occData2timeList(occList=graptOccSpecies, intervalType="occRange")

#we would expect no change in the diversity curve
	#because there are only changes in th
		#earliest bound for the FAD
		#latest bound for the LAD
#so if we are depicting ranges within maximal bounds
	#dateRanges has no effect
layout(1:2)
taxicDivDisc(graptTimeSpecies)
taxicDivDisc(graptOccRange)
#yep, identical

#so how much uncertainty was gained by using dateRange?

# write a simple function for getting uncertainty in first and last
		# appearance dates from a timeList object
sumAgeUncert<-function(timeList){
	fourDate<-timeList2fourDate(timeList)
	perOcc<-(fourDate[,1]-fourDate[,2])+(fourDate[,3]-fourDate[,4])
	sum(perOcc)
	}

#total amount of uncertainty in occRange dataset
sumAgeUncert(graptOccRange)
#total amount of uncertainty in dateRange dataset
sumAgeUncert(graptTimeSpecies)
#the difference
sumAgeUncert(graptOccRange)-sumAgeUncert(graptTimeSpecies)
#as a proportion
1-(sumAgeUncert(graptTimeSpecies)/sumAgeUncert(graptOccRange))

#a different way of doing it
dateChange<-timeList2fourDate(graptTimeSpecies)-timeList2fourDate(graptOccRange)
apply(dateChange,2,sum)
#total amount of uncertainty removed by dateRange algorithm
sum(abs(dateChange))

layout(1)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.