knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
set.seed(22)
library(birdsize)
library(dplyr)
library(ggplot2)
theme_set(theme_bw())

Overview

birdsize simulates mass measurements (in g) for birds. Working from the assumption that individuals of a species have body masses normally distributed around a species-wide mean and standard deviation, birdsize simulates individual-level body mass measurements by drawing from species-specific normal distributions. There are parameters built-in for 450 bird species, or a user may supply mean and optionally standard deviation values for additional species.

The core functions in birdsize apply at 3 levels of organization: species, population, and community:

Species-level parameters

In order to generate body mass estimates for a species, birdsize needs an abundance value (how many individuals to generate estimates for) and at least one of:

The species_define function uses the information provided to look up, or calculate, the mean and standard deviation of body mass to use to generate individuals of the desired species. This function usually operates under the hood, but may be of interest to advanced users or those interested in complex simulations involving changes to the species-wide parameters.

Population (one species) simulations

Using species identity

For the birds known to birdsize you can use the species' code (AOU) to simulate a population directly. For the hummingbird Selasphorus calliope:

a_hundred_hummingbirds <- pop_generate(abundance = 100, AOU = 4360)

head(a_hundred_hummingbirds)

ggplot(a_hundred_hummingbirds, aes(individual_mass)) +
  geom_histogram(bins = 25) +
  xlab("Mass (g)") +
  ylab("Count") +
  ggtitle("A population of hummingbirds")

Using a known mean and standard deviation

Alternatively, you can simulate body masses for a population by supplying the body size parameters yourself. This may be useful if you would like to work with a species not included in birdsize, test sensitivities to different parameter ranges, or generate values for simulation/null models (or, other applications!).

Note that, if both mean mass and a species code are provided, the species code will be used and the mean mass provided will be ignored!

a_hundred_hypotheticals <- pop_generate(abundance = 100, mean_size = 25, sd_size = 3)

head(a_hundred_hypotheticals)

ggplot(a_hundred_hypotheticals, aes(individual_mass)) +
  geom_histogram(bins = 25) +
  xlab("Mass (g)") +
  ylab("Count") +
  ggtitle("A population of hypothetical birds", subtitle = "Mean mass = 25 g\nStandard deviation = 3")

Using a known mean, but no standard deviation

If the mean mass is not known or not provided, pop_generate will estimate the standard deviation based on scaling between the mean and standard deviation of body mass:

another_hundred_hypotheticals <- pop_generate(abundance = 100, mean_size = 25)

head(another_hundred_hypotheticals)

ggplot(another_hundred_hypotheticals, aes(individual_mass)) +
  geom_histogram(bins = 25) +
  xlab("Mass (g)") +
  ylab("Count") +
  ggtitle("A population of hypothetical birds", subtitle = "Mean mass = 25 g\nStandard deviation = 1.74")

Community (multiple populations) simulations

community_generate takes a dataframe with species-level information (AOU, scientific name, or mean and/or standard deviation body mass) and population sizes, and returns a dataframe of individual-level mass and BMR measurements for all the entries in the input data frame.

Simulations using AOU

Here, we use a synthetic dataset with records of AOU and abundance for 5 species, to make up a community:

data("toy_aou_community")

head(toy_aou_community)

community_generate can take this table and generate simulated individual measurements with no additional tweaks. It uses the AOU and abundance columns from toy_aou_community to look up species' mean and standard deviation body masses based on their AOU and then draw individual size measurements from a normal distribution with those parameters.

toy_aou_sims <- community_generate(community_data_table = toy_aou_community, abundance_column_name = "abundance")

head(toy_aou_sims)

The resulting table has one row per individual, with a unique individual_size and individual_bmr estimate for that individual.

We can then plot the individual size distribution for the community, colored by species ID:

toy_aou_sims$`Scientific name` = toy_aou_sims$scientific_name

ggplot(toy_aou_sims, aes(individual_mass, fill = `Scientific name`)) +
  geom_histogram(position = "stack") +
  scale_fill_viridis_d() +
  scale_x_log10() +
  ggtitle("Community simulated via AOU") +
  xlab("Body mass (g)") +
  ylab("Number of individuals") +
  theme(legend.position = "bottom", legend.direction = "vertical")

Simulation given species' names

If the AOU is not known or not provided, community_generate will attempt to look up species' size parameters based on their scientific name.

data("toy_species_name_community")

head(toy_species_name_community)

community_generate still runs, but note that the sd_method here is listed as Scientific name lookup rather than AOU lookup (above).

toy_species_name_sims <- community_generate(toy_species_name_community, abundance_column_name = "abundance")

head(toy_species_name_sims)
toy_species_name_sims$`Scientific name` = toy_species_name_sims$scientific_name

ggplot(toy_species_name_sims, aes(individual_mass, fill = scientific_name)) +
  geom_histogram(position = "stack") +
  scale_fill_viridis_d() +
  scale_x_log10() +
  ggtitle("Community simulated via scientific name") +
  xlab("Body mass (g)") +
  ylab("Number of individuals") +
  theme(legend.position = "bottom", legend.direction = "vertical")

Simulation given mean size measurements

If species name or AOU are not known, or are not included in this dataset (see known_species for the full set of included species), estimates can still be generated by providing the mean and, if available, standard deviation of body mass directly. If standard deviation is provided, it will be used; if not, it will be estimated based on the scaling relationship between mean and standard deviation of body mass for birds (see the scaling vignette).

This functionality may be especially useful for users interested in using their own parameter values, rather than relying on the ones provided in birdsize. For example, birdsize does not take into account temporal or geographic variation in body size, which may be significant. A user could manually provide values to explore these scenarios.

Note that 1) the mean body size data must be provided in a column named mean_size and 2) the sim_species_id column acts as a species identifier in the absence of other taxonomic information:

data(toy_size_community)

head(toy_size_community)
toy_size_sims <- community_generate(toy_size_community, abundance_column_name = "abundance")

head(toy_size_sims)
toy_size_sims$`Sim species ID` <- as.character(toy_size_sims$sim_species_id)

ggplot(toy_size_sims, aes(individual_mass, fill = `Sim species ID`)) +
  geom_histogram(position = "stack") +
  scale_fill_viridis_d() +
  scale_x_log10() +
  ggtitle("Community simulated via mean body size") +
  xlab("Body mass (g)") +
  ylab("Number of individuals") +
  theme(legend.position = "bottom")

Community-wide summary statistics

The body size measurements generated by birdsize are estimates and are generally best-suited for large-scale, aggregate analyses. For these bigger-picture analyses, use general aggregation and summary functions to generate community-wide summaries.

For example, using dplyr:

Biomass total by species

toy_species_name_sims_summary <-
  toy_species_name_sims |>
  group_by(scientific_name) |>
  summarize(
    total_mass = sum(individual_mass),
    total_n_individuals = n()
  )

toy_species_name_sims_summary

Community total biomass

toy_species_name_sims_totals <-
  toy_species_name_sims |>
  summarize(
    total_mass = sum(individual_mass),
    total_n_individuals = n(),
    total_species_richness = length(unique(scientific_name))
  )

toy_species_name_sims_totals


diazrenata/birdsize documentation built on Jan. 29, 2024, 12:25 p.m.