In Athospd/mestrado: Meu mestrado

The Data {.chapter}

Training/Testing data

Two main data sources with almost ready to use labels were gathered:

Those websites are the biggest repository of Brazilian bird species sounds. The table \@ref(tab:mp3-counts) show the counts of MP3 downloaded by species selected for the study.

library(magrittr)

tibble::tibble(Species = list.files("../../data-raw/wav_16khz/") %>%
             stringr::str_remove_all("[.0-9]|(wav)") %>%
             stringr::str_remove_all("-$")
) %>%
  dplyr::count(Species, name = "#mp3", sort = TRUE) %>%
  janitor::adorn_totals() %>%
  knitr::kable(caption = "MP3 files downloaded.") %>%
  kableExtra::kable_styling(font_size = 16) %>%
  kableExtra::row_spec(6, bold = TRUE)

No providence was made if the same audio happened to appear at both sources.

The {warbleR} [@warbleR2017] and {wikiaves} [@R-wikiaves] packages were used to retrieve all the mp3 available automatically^[A tutorial to reproduce the extraction can be found in this link https://athospd.github.io/mestrado/articles/data-gathering.html].

Audio preparation

All the audios donwloaded from Xeno-Canto and Wikiaves comes as mp3 files. Also, they can be stereo or mono, in potentially any sample rate and unnormalized. The first step of the data preparation standardize these properties.

To match the target-landscape's setup, the following steps were applied on every mp3 file:

1) convert MP3 to WAV; 2) stereo to mono; 3) downsample to 16kHz; 4) normalize amplitude by rescaling to 16-bits (i.e. integers in [-32767, 32767]).

Wave file slicing

The observations of the training set will be slices of each wave file with a given length of time. For instance, if an original wave file has duration of 55 seconds, then the slicing with interval of 1 second and no overlap will result in 55 disjoint 1 second long slices.

Table \@ref(tab:slice-example) shows the layout of the sliced dataset: the original audio file name as id, the start point of the slice and the end point of the slice. The final slice dataset is built by stacking the slices of all the files.

library(dplyr)
tibble::tibble(
  file_name = list.files("../../data-raw/wav_16khz_1000ms/")
) %>%
  tidyr::separate(file_name, c("id", "start", "end"), sep = "@", extra = "drop") %>%
  filter(as.numeric(start) < 9) %>%
  head(3) %>%
  knitr::kable(caption = "Example of a dataset of slices from `Glaucidium-minutissimum-1066225.wav` audio file.", align = "lcc") %>%
  kableExtra::kable_styling(font_size = 16)

The experiment tested two length of the slices: 1 second long and 2 seconds long, with no overlap. Table \@ref(tab:slice-counts) shows the total counts of slices generated after the slicing process. There is a trade-off behind the window length choice: the smaller the interval, the more samples will be created;on the other hand, less information will fit in the frame causing issues for the learner. So there are a optimal interval length to be discovered and this issue will be addressed by testing both 1 and 2 seconds frame datasets.

library(magrittr)
library(dplyr)
one_sec_slices_metadata <- tibble::tibble(
  file_name = list.files("../../data-raw/wav_16khz_1000ms/")
) %>%
  tidyr::separate(file_name, c("id", "start", "end"), sep = "@", extra = "drop") %>%
  mutate(
    Species = id %>%
             stringr::str_remove_all("[.0-9]|(wav)") %>%
             stringr::str_remove_all("-$")
  ) %>%
  dplyr::group_by(Species) %>%
  summarise(
    `#mp3` = n_distinct(id),
    `#1sec slices` = n()
  ) %>%
  arrange(desc(`#mp3`)) 

two_sec_slices_metadata <- tibble::tibble(
  file_name = list.files("../../data-raw/wav_16khz_2000ms/")
) %>%
  tidyr::separate(file_name, c("id", "start", "end"), sep = "@", extra = "drop") %>%
  mutate(
    Species = id %>%
             stringr::str_remove_all("[.0-9]|(wav)") %>%
             stringr::str_remove_all("-$")
  ) %>%
  dplyr::group_by(Species) %>%
  summarise(
    `#mp3` = n_distinct(id),
    `#2sec slices` = n()
  ) %>%
  arrange(desc(`#mp3`)) 


left_join(
  one_sec_slices_metadata,
  two_sec_slices_metadata
)%>%
  janitor::adorn_totals() %>%
  knitr::kable(caption = "Slices.", format = "html") %>%
  kableExtra::kable_styling(font_size = 16) %>%
  kableExtra::row_spec(6, bold = TRUE)

Data Labelling

Each slice can contain parts of bird songs in it or not. In this section, the annotation part is suposed to be already done. The annotation task is discussed in section Data Annotation.

The table \@ref(tab:annotation-rds) show an example of the tables that were generated to store where the bird songs happened in each wave file and the respective bird species that made it. At the end of the labelling process the dataset had six distinct labels: five from bird species and one for "other events than bird songs". This later label was called as "unknown" and these other, unknown events could be silence, noise, any other animal, or even other birds different from the five ones previous picked.

readRDS("../../data_/anotacoes/Glaucidium-minutissimum-1066225.rds") %>%
  head(5) %>%
  mutate(
    audio_id = "Glauc...225.wav",
    region_id = gsub("wavesurfer_", "", region_id),
    start = round(start, 2),
    end = round(end, 2),
    label = "Glauc...mum"
  ) %>%
  knitr::kable(caption = "The first four rows of the table storing the moments where the bird songs happen inside the `Glaucidium-minutissimum-1066225.wav` audio file along with the respective bird species label.") %>%
  kableExtra::kable_styling(font_size = 16, full_width = TRUE)

After build one of such tables for every wave file, a final dataset with all the slices labeled was made, one for one second long slices and other for two second long. Those tables have r sum(one_sec_slices_metadata[[3]]) and r sum(two_sec_slices_metadata[[3]]) rows respectively, as shown in Table \@ref(tab:slice-counts). This dataset was made joining \@ref(tab:slice-example) with \@ref(tab:annotation-rds) by interval. The slices id were convinientely named as

{bird-species}@{start}@{end}@.wav

so one can retrieve where this slices lays in the original wave file by separating the parts by @. For instance, Table \@ref(tab:db-labeled) show that a Glaucidium minutissimum sang between the seconds 0 and 1, and also between the seconds 0 and 2 of the Glaucidium-minutissimum-1066225.wav audio file. The seconds 10 to 12 there was no bird song events of interest, so it was labeled as "unknown".

readRDS("../../data_/slices_1000ms_labels_by_humans.rds") %>%
  head(5) %>%
  knitr::kable(caption = "The first four rows of the table of all 1 second slices along with the bird species label.") %>%
  kableExtra::kable_styling(font_size = 16, full_width = TRUE)

Data Annotation {#data-annotation}

The software used to annotate the audios was the {wavesurfer} R package [@R-wavesurfer]^[The {wavesurfer} annotator app is open-source and was developed by the authors of this monograph.]. The parts containing bird songs were manually annotated by the humans.

library(magick)
image_read("../img/wavesurfer_example.png") %>%
  magick::image_modulate(saturation = 0)

Each one of the wave files were annotated manually by visual and by hearing. The software stored a dataset with the annotation information as shown in Table \@ref(tab:annotation-rds). Wave files with pure silence or damaged were discarded from the analysis.

Target Landscape

There are two untouched landscapes recorded to be scanned: "INB-Resende" and "Reserva Morro Grande". They will be used for testing the detection algorithms developed in this work.