options(htmltools.dir.version = FALSE) knitr::opts_chunk$set(echo = TRUE) library(patchwork) #devtools::install_github("thomasp85/patchwork") library(tidyverse) theme_set(theme_minimal(12))
.pull-left[
Athos Damiani, 32, Statistician from IME-USP, Brazil.
Master Student at Poli-USP, Brazil.
.small[
Advisor: PhD. Linilson Padovese
Co-advisor: PhD. Paulo Hubert Jr
Approach: Supervised Machine Learning
] ]
.pull-right[
]
knitr::include_graphics("img/lacman.png")
1) Literature review of methodologies used in bird species regocgnition tasks.
2) Build an automatic detector (classifier) for 5 brazilian bird species.
3) Scan a 5TB sized soundscape to try to spot those 5 bird species in it.
1) My findings and perception about my literature review
2) Some early experiments and partial results
.pull-left[
knitr::include_graphics("img/avianbiology.png")
Setups:
]
.pull-right[
Data Representations
Methods
]
knitr::include_graphics("img/kaggle1.png")
All of the top solutions used PyTorch
for fiting the models
Other audio competitions:
(ongoing)
knitr::include_graphics("img/kaggle2.png")
.footnote[ Lasseck, Mario. 2018. ACOUSTIC BIRD DETECTION WITH DEEP CONVOLUTIONAL NEURAL NETWORKS. Berlin. ]
.pull-left[
## Truth ## Prediction 1 2 3 ## 1 4057 4 172 ## 2 187 358 1 ## 3 361 1 322
Accuracy: 87%
## Truth ## Prediction 1 2 3 ## 1 4049 20 164 ## 2 79 443 0 ## 3 222 1 461
Accuracy: 91%
]
.pull-right[
## Truth ## Prediction 1 2 3 ## 1 4010 24 199 ## 2 102 443 1 ## 3 248 3 433
Accuracy: 89%
1) Duration of audio slices and FFT size
2) Usage of two or more algoritms ensembled (stacking)
3) Rectangular Kernels
]
Pros of Deep Learning
Automatic feature engineering and powerful performance (even when dealing with noisy audios).
Modular, easy to share, reuse and reproduce.
Tools shuch as PyTorch are free and "research-and-production-first".
Designed to be fast (optimized for GPUs).
Cons of Deep Learning
It is not a trivial field to initiate.
Requires at least basic programming background.
Requires large amount of data!
knitr::include_graphics("img/kernels.png")
Absence of labelled datasets is the "bottle neck".
"It is important that benchmark datasets are available, so that different researchers can compare their methods on the same datasets, and using the same metrics."
“There is the need for shared datasets with annotations of a wide variety of calls for a large number of species if methods that are suitable for conservation work are to be developed.”
— Automated birdsong recognition in complex acoustic environments: a review (Nirosha Priyadarshani, Stephen Marsland and Isabel Castro, 2017)
.pull-left[
knitr::include_graphics("img/datasets.png")
Source: Pytorch.org
]
.pull-rigth[
knitr::include_graphics("img/datasetdownload.png")
knitr::include_graphics("img/datasetcitation.png")
]
.pull-left[
Sources of brazilian bird calls that are not "machine-learning-ready" yet:
Datasets of bird sounds from Kaggle are "Machine-learning-ready", but lacks of brazilian representants.
Solution is to build our own.
]
.pull-right[
knitr::include_graphics("img/birdcallbr.png")
Not published yet
]
{warbleR} R package by Marcelo Araya-Salas (2010)
metadata_xc = map(bird_species, ~querxc(.x))
{wikiaves} R package from LACMAM (2019)
metadata_wa = map(bird_species, ~querwa(.x))
.mp3 files downloaded.
library(magrittr) tibble::tibble(Species = list.files("../../data-raw/wav_16khz/") %>% stringr::str_remove_all("[.0-9]|(wav)") %>% stringr::str_remove_all("-$") ) %>% dplyr::count(Species, name = "#mp3", sort = TRUE) %>% janitor::adorn_totals() %>% knitr::kable(caption = "MP3 files downloaded.", format = "markdown") %>% kableExtra::kable_styling(font_size = 16) %>% kableExtra::row_spec(6, bold = TRUE)
.pull-left[
Step-by-step tutorials with code for reproducibility
knitr::include_graphics("img/site2.png")
https://athospd.github.io/mestrado/
]
.pull-right[ R packages created:
{wikiaves}
- mp3 download{wavesurfer}
- annotation{torchaudio}
- modeling{mestrado}
- reproducibility]
Padovese B., Padovese L. (2019) Machine Learning for Identifying an Endangered Brazilian Psittacidae Species
Priyadarshani N. et al. (2017) Automated birdsong recognition in complex acoustic environments: a review
Serra, O. et al. (2019) Active contour-based detection of estuarine dolphin whistles in spectrogram images
Jawaherlalnehru, J. et al. (2019) Music Instrument Recognition from Spectrogram Images Using Convolution Neural Network
Keydana, S. (2020) Classifying images with torch
Sajjad, A et al. (2019) End-to-End Environmental Sound Classification using a 1D Convolutional Neural Network
class: inverse, center, middle
Tool for audio annotation in R r shiny::icon("github")
Athospd/wavesurfer
# shiny UI wavesurfer( "wavs_folder/wav_file.wav", # or .mp3 visualization = 'spectrogram' #<< ) %>% ws_annotator(labels = c("birdsong", "silence", "insect")) %>% ws_minimap() %>% ws_cursor()
Tool for audio annotation in R r shiny::icon("github")
Athospd/wavesurfer
.pull-left[
knitr::include_graphics("img/Espectrograma.jpg")
knitr::include_graphics("img/mfcc.jpg")
]
.pull-right[
passo 1) transform...
$$
mel = 2595 \log_{10}(1 + \frac{f}{700})
$$
passo 2) weighted mean by frequency region...
knitr::include_graphics("img/mel_filters.jpg")
]
Fonte: haythamfayek.com
.footnote[ fonte: http://tommymullaney.com/projects/rhythm-games-neural-networks ]
.pull-left[ - Define a matrix of weights (the "shadow" onto animation)
Scan the image with this matrix (blue matrix).
The new 'imagem' (in green) is made by the convolution between the blue matrix and the shadow.
]
.pull-right[
]
knitr::include_graphics("img/gbm.png")
Where x represents all of the pixels from a MFCC set of an audio sample: - 1 second long slices - FFT window of 512 samples and sample rate of 16kHz with no overlap - 13 MFCC's
Total pixels: $13 \times 1 \times (16000 / 512) \approx 406$
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.