calc_post_prob_presence: Calculate posterior probability of presence, given count data...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/BioGeoBEARS_detection_v1.R

Description

This function calculates P(presence|count data,parameters), i.e. the posterior probability of presence in an area, given data on detection counts and taphonomic control counts, and a detection model with the parameters mean_frequency, dp, a detection probability (and, optionally, a false detection probability, fdp).

Usage

1
2
3
4
  calc_post_prob_presence(prior_prob_presence = 0.01,
    obs_target_species, obs_all_species,
    mean_frequency = 0.1, dp = 1, fdp = 0,
    print_progress = "")

Arguments

prior_prob_presence

The prior probability of presence, i.e. when no detection or taphonomic control data whatsoever is available. Default is set to 0.01 which expresses my totally uninformed bias that in whatever your data is, your species of interest probably doesn't live in the typical area you are looking at.

obs_target_species

A count of detections of your OTU of interest, e.g. as produced from a cell of the matrix output from read_detections.

obs_all_species

A count of detections of your taphonomic controls, e.g. as produced from a cell of the output from read_controls.

mean_frequency

This is the proportion of samples from the taphonomic control group that will truly be from this OTU, GIVEN that the OTU is present. This could be estimated, but a decent first guess is (total # samples of OTU of interest / total # of samples in the taphonomic control group where the OTU is known to be present). All that is really needed is some reasonable value, such that more sampling without detection lowers the likelihood of the data on the hypothesis of true presence, and vice versa. This value can only be 1 when the number of detections = the number of taphonomic control detections, for every OTU and area. This is the implicit assumption in e.g. standard historical biogeography analyses in LAGRANGE or BioGeoBEARS.

dp

The detection probability. This is the per-sample probability that you will correctly detect the OTU in question, when you are looking at it. Default is 1, which is the implicit assumption in standard analyses.

fdp

The false detection probability. This is probability of falsely concluding a detection of the OTU of interest occurred, when in fact the specimen was of something else. The default is 0, which assumes zero error rate, i.e. the assumption being made in all historical biogeography analyses that do not take into account detection probability. These options are being included for completeness, but it may not be wise to try to infer mean_frequency, dp and fdp all at once due to identifiability issues (and estimation of fdp may take a very large amount of data). However, fixing some of these parameters to reasonable values can allow the user to effectively include beliefs about the uncertainty of the input data into the analysis, if desired.

print_progress

If not the default (""), print whatever is in print_progress, followed by a space (for error checking/surveying results).

Details

Essentially, this function combines a prior probability, with the likelihood function (coded in calc_obs_like) to produce a posterior probability of presence given Bayes' Theorem (Bayes & Price, 1763).

The idea of taphonomic controls dates back at least to work of Bottjer & Jablonski (1988). The basic idea is that if you have taxa of roughly similar detectability, then detections of other taxa give some idea of overall detection effort. Obviously this is a very simple model that can be criticized in any number of ways (different alpha diversity in each region, different detectability of individual taxa, etc.), but it is a useful starting point as there has been no implementation of any detection model in historical/phylogenetic biogeography to date.

One could imagine (a) every OTU and area has a different count of detections and taphonomic control detections, or (b) the taphonomic control detections are specified by area, and shared across all OTUs. Situation (b) is likely more common, but this function assumes (a) as this is the more thorough case. Behavior (b) could be reproduced by summing each column, and/or copying this sum to all cells for a particular area.

Value

post_prob The posterior probability of presence, given the prior probability, the model parameters, and the data.

Note

Go BEARS!

Author(s)

Nicholas J. Matzke matzke@berkeley.edu

References

http://phylo.wikidot.com/matzke-2013-international-biogeography-society-poster http://en.wikipedia.org/wiki/Bayes'_theorem

Matzke_2012_IBS

Bottjer_Jablonski_1988

Bayes_1763

See Also

calc_obs_like, mapply_calc_post_prob_presence, mapply_calc_obs_like

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
# Calculate posterior probability of presence in an area,
# given a dp (detection probability) and detection model.

# With zero error rate
obs_target_species = 10
obs_all_species = 100
mean_frequency=0.1
dp=1
fdp=0
prior_prob_presence = 0.01
post_prob = calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)
post_prob
# i.e., with perfect detection, the prob. of presence is 1 under the
# hypothesis of presence, and 0 under the hypothesis of
# (This is because the likelihood of the data under
# presence and absence are ln(prob) = 0 & -Inf, respectively.)


# Note that with very high error rates, your conclusion could reverse
obs_target_species = 10
obs_all_species = 100
mean_frequency=0.1
dp=0.5
fdp=0.3
prior_prob_presence = 0.01
post_prob = calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)
post_prob


# With 0 error rate, even 1 observation makes P(presence) = 1
obs_target_species = 1
obs_all_species = 100
mean_frequency=0.1
dp=1
fdp=0
prior_prob_presence = 0.01
post_prob = calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)
post_prob


# With a small error rate, there is some small but positive probability of
# falsely getting 10 detections; but it may be effectively 0
obs_target_species = 10
obs_all_species = 100
mean_frequency=0.1
dp=0.99
fdp=0.001
prior_prob_presence = 0.01
post_prob = calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)
post_prob


# If you have only 1 detection, and you have 100 taphonomic controls and
# a mean_frequency of sampling the OTU of interest of 0.1, then there is
# still a very low probability of presence (since, under your model,
# you should expect to see about 10 detections, not 1)
obs_all_species = 100
mean_frequency=0.1
dp=0.99
fdp=0.001
prior_prob_presence = 0.01

obs_target_species = 0
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 1
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 2
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 3
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)



# Note how quickly this chances if you drop the mean_frequency from 0.1
# to 0.01. This means that if you want single detections to count for
# a lot, you need either a low mean_frequency which matches the observed
# frequency, or an extremely high/perfect detection probability (dp).
obs_all_species = 100
mean_frequency=0.01
dp=0.99
fdp=0.001
prior_prob_presence = 0.01

obs_target_species = 0
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 1
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 2
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 3
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)


# Changing mean_frequency from 0.01 to 0.001 actually LOWERS the posterior
# probability of presence based on 1 detection, as we have a somewhat
# significant false detection rate:
obs_target_species = 1
obs_all_species = 100
mean_frequency=0.001
dp=0.99
fdp=0.001
prior_prob_presence = 0.01

obs_target_species = 0
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 1
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 2
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 3
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)


# Change false detection probability to a much lower value
obs_all_species = 100
mean_frequency=0.001
dp=0.99
fdp=0.00001
prior_prob_presence = 0.01

obs_target_species = 0
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 1
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 2
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 3
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)




# Change false detection probability to 0
obs_all_species = 100
mean_frequency=0.001
dp=0.99
fdp=0.0
prior_prob_presence = 0.01


obs_target_species = 0
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 1
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 2
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 3
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)









# Change mean_frequency to 0.001
obs_all_species = 100
mean_frequency=0.001
dp=0.99
fdp=0.0
prior_prob_presence = 0.01

obs_target_species = 0
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 1
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 2
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)

obs_target_species = 3
calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)







# Example #2 -- what if you have ZERO detections, but lots of detections
# of your taphonomic control?
obs_target_species = 0
obs_all_species = 100
mean_frequency=0.1
dp=1
fdp=0
prior_prob_presence = 0.01
post_prob = calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)
post_prob

# With a slight error rate
obs_target_species = 0
obs_all_species = 100
mean_frequency=0.1
dp=0.99
fdp=0.001
prior_prob_presence = 0.01
post_prob = calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)
post_prob


obs_target_species = 0
obs_all_species = 2
mean_frequency=0.1
dp=1
fdp=0
prior_prob_presence = 0.01
post_prob = calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)
post_prob

# With a slight error rate
obs_target_species = 0
obs_all_species = 2
mean_frequency=0.1
dp=0.99
fdp=0.001
prior_prob_presence = 0.01
post_prob = calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)
post_prob





# Example #3 -- what if you have ZERO detections, but only a few
# detections of your taphonomic control?
obs_target_species = 0
obs_all_species = 1
mean_frequency=0.1
dp=1
fdp=0
prior_prob_presence = 0.01
post_prob = calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)
post_prob

# With a slight error rate
obs_target_species = 0
obs_all_species = 1
mean_frequency=0.1
dp=0.99
fdp=0.001
prior_prob_presence = 0.01
post_prob = calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)
post_prob



# Special cases -- e.g., no data
# Prob(data)=1, ln(prob)=0
obs_target_species = 0
obs_all_species = 0
mean_frequency=0.1
dp=0.99
fdp=0.001
prior_prob_presence = 0.01
post_prob = calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)
post_prob

obs_target_species = 0
obs_all_species = 0
mean_frequency=0.1
dp=1
fdp=0
prior_prob_presence = 0.01
post_prob = calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)
post_prob


# What if, for some reason, you put in identical detections and taphonomic control
# counts? (e.g., you load in a standard tipranges file)
obs_target_species = 1
obs_all_species = 1
mean_frequency=1
dp=1
fdp=0
prior_prob_presence = 0.01
post_prob = calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)
post_prob

# What if, for some reason, you put in identical detections and taphonomic control
# counts? (e.g., you load in a standard tipranges file)
obs_target_species = 1
obs_all_species = 1
mean_frequency=1
dp=0.99
fdp=0.001
prior_prob_presence = 0.01
post_prob = calc_post_prob_presence(prior_prob_presence, obs_target_species,
obs_all_species, mean_frequency, dp, fdp)
post_prob

Example output

Loading required package: rexpokit
Loading required package: cladoRcpp
Loading required package: ape
Loading required package: phylobase

Attaching package: 'phylobase'

The following object is masked from 'package:ape':

    edges

[1] 1
[1] 0.001415844
[1] 1
[1] 1
[1] 2.998417e-07
[1] 3.324436e-05
[1] 0.003672609
[1] 0.2901295
[1] 0.003720971
[1] 0.03945846
[1] 0.311213
[1] 0.8324846
[1] 0.009065541
[1] 0.01788852
[1] 0.03499517
[1] 0.06733911
[1] 0.009065532
[1] 0.4780096
[1] 0.9892084
[1] 0.999891
[1] 0.009065532
[1] 1
[1] 1
[1] 1
[1] 0.009065532
[1] 1
[1] 1
[1] 1
[1] 2.682969e-07
[1] 2.998417e-07
[1] 0.008115419
[1] 0.008133335
[1] 0.009009009
[1] 0.009018939
[1] 0.01
[1] 0.01
[1] 1
[1] 0.9090909

BioGeoBEARS documentation built on May 29, 2017, 8:36 p.m.