knitr::opts_chunk$set( fig.align = "center", fig.width = 6, fig.height = 6, dpi = 300, tidy = TRUE, collapse = TRUE, comment = "#>", fig.path = "inst/README-" )
This package is in a very preliminary state and provides some workflows for processing large sound files (e.g., NocMig
, NFC
, AudioMoth
), with a main emphasis on automatising the detection of events (i.e., extracting calls with time-stamps) that can be easily reviewed in Audacity. Note: On the occasion of some recent changes to the data privacy policy and ownership of Audacity I highly suggest to stick to version 3.0.2!
All major computation steps are carried out by sophisticated libraries called in the background. Including:
python packages
To install the package, use ...
devtools::install_github("mottensmann/NocMigR")
Load the package once installed ...
library(NocMigR)
The package contains an example file captured using an AudioMoth recorder. To reduce file size, a segment of five minutes was resampled at 44.1 kHz and saved as 128 kbps mp3 file. In addition to a lot of noise there is short segment of interest (scale call of a Eurasian Pygmy Owl Glaucidium passerinum).
## get path to test_audio.mp3 path <- system.file("extdata", "20211220_064253.mp3", package = "NocMigR") ## create temp folder dir.create("example") ## copy to test_folder file.copy(path, "example") ## convert to wav bioacoustics::mp3_to_wav("example/20211220_064253.mp3", delete = T) file.rename(from = "example/20211220_064253.wav", to = "example/20211220_064253.WAV")
Plot spectrogram to see there is a lot of noise and a few spikes reflecting actual signals ...
## read audio audio <- tuneR::readWave("example/20211220_064253.WAV") ## plot spectrum bioacoustics::spectro(audio, FFT_size = 2048, flim = c(0, 5000))
rename_recording
Naming files using a string that combines the recording date and starting time (YYYYMMDD_HHMMSS
) is convenient for archiving and analysing audio files (e.g, default of AudioMoth). Some (most?) of the popular field recorders (e.g., Olympus LS, Tascam DR or Sony PCM) use different, rather uninformative naming schemes (date and number at best), but the relevant information to construct a proper date_time string is embedded in the meta data of the recording (accessible using file.info()
, but requires correct settings of the internal clock!). For instance, long recording sessions using an Olympus LS-3 will create multiple files, all of which share the same creation and modification times (with respect to the first recording). By contrast, the Sony PCM-D100 saved files individually (i.e., all have unique ctimes and mtimes). Presets to rename files are available for both types described here.
## Simulate = T allows to see what would happen without altering files rename_recording(path = "example", format = "WAV", recorder = "Olympus LS-3", simulate = T)
split_wave
: Divide long recordingsThis function allows to split long audio recordings into smaller chunks for processing with bioacoustics::threshold_detection
. To keep the time information, files are written with the corresponding starting time. *The task is performed using a python script queried using reticulate
## split in segments split_wave(file = "20211220_064253.WAV", # which file path = "example", # where to find it segment = 30, # cut in 30 sec segments downsample = 32000) # resample at 32000 ## show files list.files("example/split/") ## delete folder unlink("example/split", recursive = TRUE)
find events
: Identify signals of interestThis functions is a wrapper to bioacoustics::threshold_detection()
aiming at extracting calls based on the signal to noise ratio and some target-specific assumptions about approximate call frequencies and durations. Check ?bioacoustics::threshold_detection()
for details. Note, only some of the parameters that are defined in bioacoustics::threshold_detection()
are used right know. For long recordings (i.e, several hours) it makes sense to run on segments as created before to avoid memory issues. Here we use the demo sound file as it is
## run detection threshold algorithm TD <- find_events(wav.file = "example/20211220_064253.WAV", threshold = 8, # Signal-to-noise ratio in db min_dur = 20, # min length in ms max_dur = 300, # max length in ms LPF = 5000, # low-pass filter at 500 Hz HPF = 1000) # high-pass filter at 4 kHz ## Review events head(TD$data$event_data[,c("filename", "starting_time", "duration", "freq_max_amp")]) ## display spectrogram based on approximate location of first six events audio <- tuneR::readWave("example/20211220_064253.WAV", from = 46, to = 50, units = "seconds") bioacoustics::spectro(audio, FFT_size = 2048, flim = c(0, 5000))
In addition to the output shown above, a file with labels for reviewing events in Audacity
is created (wrapping seewave::write.audacity()
).
knitr::include_graphics("inst/extdata/screenshot_1.PNG")
extract_events
: Subset original recording fileRefines the output of find_events
by first adding a buffer (default 1 second on both sides of the event) and subsequently merging overlapping selections to make the output more pretty. Additionally, allows to filter based on expected frequencies (i.e., checks maximum amplitude frequency is within the frequency band defined by HPF:LPF
)
## extract events based on object TD df <- extract_events(threshold_detection = TD, path = "example", format = "WAV", LPF = 4000, HPF = 1000, buffer = 1)
Display refined events ...
## display spectrogram based on first six events audio <- tuneR::readWave("example/20211220_064253.WAV", from = df$from, to = df$to, units = "seconds") bioacoustics::spectro(audio, FFT_size = 2048, flim = c(0, 5000))
knitr::include_graphics("inst/extdata/screenshot_2.PNG")
merge_events
: Pool all detected eventsTakes the output of the previous operation and concatenates audio signals as well as labels into files called merged.events.wav
and merged.events.txt
respectively. This option comes handy if there are many input files in the working directory.
merge_events(path = "example")
batch_process
: Entire workflow combined in a single function callProcess all files within a directory and run the steps shown above
batch_process( path = "example", format = "WAV", segment = NULL, downsample = NULL, SNR = 8, target = data.frame(min_dur = 20, # min length in ms max_dur = 300, # max length in ms LPF = 5000, # low-pass filter at 500 Hz HPF = 1000), rename = FALSE)
my_data <- data.frame( Recording = c("60 h", "60 h", "11.91", "10.6 h", "2.73"), `Sample rate` = "96000 Hz", Downsampled = "441000 Hz", Channels = c("Mono", "Mono", "Stereo", "Mono", "Mono"), `Run time` = c("2.02 h", "1.76 h", "1.39 h", "1.3 h", "4.88 min") ) pander::pandoc.table(my_data, style = "rmarkdown", "Run times all steps, notebook ~ Intel i5-4210M, 2 cores ~ 8 GB RAM") my_data <- data.frame( Recording = c("7.5 h"), `Sample rate` = "96000 Hz", Downsampled = "441000 Hz", Channels = "Mono", `Run time` = c("14.52 min" ) ) pander::pandoc.table(my_data, style = "rmarkdown", "Run times only event detection, notebook ~ Intel i5-4210M, 2 cores ~ 8 GB RAM")
Update:
With adequate computational power there is no need to split even larger wave files into segments of one hour. This way, the event detection process is much faster (steps 3:6), usually less than four minutes for an entire NocMig night!
my_data <- data.frame( Recording = c("114.99 h"), `Sample rate` = "48000 Hz", Downsampled = "441000 Hz", Channels = "Mono", `Run time` = c("26.79 min" ) ) pander::pandoc.table(my_data, style = "rmarkdown", "115h AudioMoth recording, notebook ~ AMD RYZEN 7, 16 cores ~ 24 GB RAM")
## cleanup unused files unlink("example/temp/", recursive = T) unlink(x = c("example/20211220_064253_extracted.WAV", "example/merged_events.WAV", "example/merged_events.txt", "example/20211220_064253_extracted.txt", "example/20211220_064253.txt"))
Header for observation lists on ornitho.de
Retrieve weather data via Bright Sky (de Maeyer 2020) and compose a string describing a NocMig session from dusk to dawn for a given location. Note, the comment follows suggestions by HGON (Schütze et al 2022)
## example for Bielefeld ## ----------------------------------------------------------------------------- NocMig_meta(date = Sys.Date() - 2, lat = 52.032, lon = 8.517)
Integrating BirdNET-Analyzer in processing routine
Recently I started to play with BirdNET. First trials suggest that only few calls of interest are missed, and the majority is correctly labelled using the BirdNET_GLOBAL_6K_V2.4 model. Currently, it is rather difficult to run BirdNET through RStudio on a windows computer, and hence a few lines of python code are pasted to a Linux (Ubuntu) command line
BirdNET-Analyzer
## Creates a species list by subsetting from the full model ## ----------------------------------------------------------------- BirdNET_species.list(names = c("Glaucidium passerinum", "Bubo bubo"), scientific = T, out = "example/species.txt")
Run analyze.py using a command line program (e.g. Ubuntu on windows). See details in the documentation of Birdnet
```{python, eval = FALSE}
python3 analyze.py --i /exampele --o /exampele --slist /example/species.txt --rtype 'audacity' --threads 1 --locale 'de'
#### 2.) Adjust audacity labels and summarise records The function `BirdNET` (see ?BirdNET for details) does the following: (1) Reshape audacity labels created by `analyze.py` to include the event time: (2) Write records to xlsx file (BirdNET.xlsx) as a template to simplify inspection and verification: ```r df <- BirdNET(path = "example/") df[["Records"]] ## records per species and day df[["Records.dd"]] ## records per species and hour df[["Records.hh"]]
Extract detections and export them as wav files. For easier access to verify records files are named as 'Species_Date_Time.WAV' (see below).
## extract events BirdNET_extract(path = "example/", hyperlink = F) ## If T: create hyperlink as excel formula ## show files list.files("example/extracted/Sperlingskauz/")
## clean-up unlink("example", recursive = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.