options(htmltools.dir.version = FALSE)
knitr::opts_chunk$set(echo = TRUE)
library(patchwork) #devtools::install_github("thomasp85/patchwork")
library(tidyverse)
theme_set(theme_minimal(12))

About me

.pull-left[

Athos Damiani, 32, Statistician from IME-USP, Brazil.

Master Student at Poli-USP, Brazil.


Project topic:

Automated Bird Species Recognition Based on Their Songs

.small[

] ]

.pull-right[

aplicacao_corujas

]


About LACMAM

knitr::include_graphics("img/lacman.png")

Objectives

1) Literature review of methodologies used in bird species regocgnition tasks.

2) Build an automatic detector (classifier) for 5 brazilian bird species.

3) Scan a 5TB sized soundscape to try to spot those 5 bird species in it.

Agenda for this presentation

1) My findings and perception about my literature review

2) Some early experiments and partial results


Conclusions from literature review

1) Deep learning methods are proving to be the best approach for bird species recognition tasks.

2) Annotated dataset needed.


Methodologies

.pull-left[

knitr::include_graphics("img/avianbiology.png")

Setups:

]

.pull-right[

Data Representations

Methods

]


Evidences of deep learning success

knitr::include_graphics("img/kaggle1.png")

All of the top solutions used PyTorch for fiting the models


Evidences of deep learning success

Other audio competitions:

(ongoing)

knitr::include_graphics("img/kaggle2.png")

.footnote[ Lasseck, Mario. 2018. ACOUSTIC BIRD DETECTION WITH DEEP CONVOLUTIONAL NEURAL NETWORKS. Berlin. ]


Partial Results...

.pull-left[

MFCC + LightGBM

##           Truth
## Prediction    1    2    3
##          1 4057    4  172
##          2  187  358    1
##          3  361    1  322

Accuracy: 87%

MelSp+ResNet18 (Keydana, 2020)

##           Truth
## Prediction    1    2    3
##          1 4049   20  164
##          2   79  443    0
##          3  222    1  461

Accuracy: 91%

]

.pull-right[

Raw Audio + 1D CNN (Abdoli, 2019)

##           Truth
## Prediction    1    2    3
##          1 4010   24  199
##          2  102  443    1
##          3  248    3  433

Accuracy: 89%

Ideas from other papers

1) Duration of audio slices and FFT size

2) Usage of two or more algoritms ensembled (stacking)

3) Rectangular Kernels

]


Discussion/Opinion

Pros of Deep Learning

Cons of Deep Learning


Hand crafted vs Automatic Feature Engineering: An Illustration

knitr::include_graphics("img/kernels.png")

Annotated dataset needed

Absence of labelled datasets is the "bottle neck".

"It is important that benchmark datasets are available, so that different researchers can compare their methods on the same datasets, and using the same metrics."


“There is the need for shared datasets with annotations of a wide variety of calls for a large number of species if methods that are suitable for conservation work are to be developed.”


— Automated birdsong recognition in complex acoustic environments: a review (Nirosha Priyadarshani, Stephen Marsland and Isabel Castro, 2017)


Datasets "Culture" in Machine Learning

.pull-left[

knitr::include_graphics("img/datasets.png")

Source: Pytorch.org

]

.pull-rigth[

knitr::include_graphics("img/datasetdownload.png")


knitr::include_graphics("img/datasetcitation.png")

]


Datasets "Culture" in Machine Learning

.pull-left[

Sources of brazilian bird calls that are not "machine-learning-ready" yet:

Datasets of bird sounds from Kaggle are "Machine-learning-ready", but lacks of brazilian representants.

Solution is to build our own.

]

.pull-right[

knitr::include_graphics("img/birdcallbr.png")

Not published yet

]


Data Gathering

Xeno-canto

{warbleR} R package by Marcelo Araya-Salas (2010)

metadata_xc = map(bird_species, ~querxc(.x))

Wikiaves

{wikiaves} R package from LACMAM (2019)

metadata_wa = map(bird_species, ~querwa(.x))

Data Gathering

.mp3 files downloaded.

library(magrittr)

tibble::tibble(Species = list.files("../../data-raw/wav_16khz/") %>%
             stringr::str_remove_all("[.0-9]|(wav)") %>%
             stringr::str_remove_all("-$")
) %>%
  dplyr::count(Species, name = "#mp3", sort = TRUE) %>%
  janitor::adorn_totals() %>%
  knitr::kable(caption = "MP3 files downloaded.", format = "markdown") %>%
  kableExtra::kable_styling(font_size = 16) %>%
  kableExtra::row_spec(6, bold = TRUE)

Side products

.pull-left[

Step-by-step tutorials with code for reproducibility

knitr::include_graphics("img/site2.png")

https://athospd.github.io/mestrado/

]

.pull-right[ R packages created:

]


References


class: inverse, center, middle

Thank you!


library(wavesurfer)

Tool for audio annotation in R r shiny::icon("github") Athospd/wavesurfer

# shiny UI

wavesurfer(
  "wavs_folder/wav_file.wav", # or .mp3
  visualization = 'spectrogram' #<<
) %>%
  ws_annotator(labels = c("birdsong", "silence", "insect")) %>% 
  ws_minimap() %>%
  ws_cursor()


library(wavesurfer)

Tool for audio annotation in R r shiny::icon("github") Athospd/wavesurfer


Predictive Modeling Methodologies

.pull-left[

Spectrogram

knitr::include_graphics("img/Espectrograma.jpg")

MFCC

knitr::include_graphics("img/mfcc.jpg")

]

.pull-right[


passo 1) transform...


$$ mel = 2595 \log_{10}(1 + \frac{f}{700}) $$


passo 2) weighted mean by frequency region...


knitr::include_graphics("img/mel_filters.jpg")

]

Fonte: haythamfayek.com


Predictive Modeling Methodologies

Convolutional Neural Networks (CNN's)

.footnote[ fonte: http://tommymullaney.com/projects/rhythm-games-neural-networks ]


Predictive Modeling Methodologies

Convolutional Neural Networks (CNN's)

.pull-left[ - Define a matrix of weights (the "shadow" onto animation)

]

.pull-right[

]

Fonte: Conv arithmetic

Predictive Modeling Methodologies

Gradient Boosting Machines

knitr::include_graphics("img/gbm.png")

Where x represents all of the pixels from a MFCC set of an audio sample: - 1 second long slices - FFT window of 512 samples and sample rate of 16kHz with no overlap - 13 MFCC's

Total pixels: $13 \times 1 \times (16000 / 512) \approx 406$




Athospd/mestrado documentation built on Jan. 2, 2021, 3:59 a.m.