knitr::opts_chunk$set(
  fig.align = "center",
  fig.width = 6,
  fig.height = 6,
  dpi = 300,
  tidy = TRUE,
  collapse = TRUE,
  comment = "#>",
  fig.path = "inst/README-"
)

NocMigR package


This package is in a very preliminary state and provides some workflows for processing large sound files (e.g., NocMig, NFC, AudioMoth), with a main emphasis on automatising the detection of events (i.e., extracting calls with time-stamps) that can be easily reviewed in Audacity. Note: On the occasion of some recent changes to the data privacy policy and ownership of Audacity I highly suggest to stick to version 3.0.2!

All major computation steps are carried out by sophisticated libraries called in the background. Including:

To install the package, use ...

devtools::install_github("mottensmann/NocMigR")

Examples and documentation of main functions


Load the package once installed ...

library(NocMigR)

Load example audio

The package contains an example file captured using an AudioMoth recorder. To reduce file size, a segment of five minutes was resampled at 44.1 kHz and saved as 128 kbps mp3 file. In addition to a lot of noise there is short segment of interest (scale call of a Eurasian Pygmy Owl Glaucidium passerinum).

## get path to test_audio.mp3
path <- system.file("extdata", "20211220_064253.mp3", package = "NocMigR")
## create temp folder
dir.create("example")
## copy to test_folder
file.copy(path, "example")
## convert to wav
bioacoustics::mp3_to_wav("example/20211220_064253.mp3", delete = T)
file.rename(from = "example/20211220_064253.wav", to = "example/20211220_064253.WAV")

Plot spectrogram to see there is a lot of noise and a few spikes reflecting actual signals ...

## read audio
audio <- tuneR::readWave("example/20211220_064253.WAV")
## plot spectrum
bioacoustics::spectro(audio, FFT_size = 2048, flim = c(0, 5000))

1.) rename_recording

Naming files using a string that combines the recording date and starting time (YYYYMMDD_HHMMSS) is convenient for archiving and analysing audio files (e.g, default of AudioMoth). Some (most?) of the popular field recorders (e.g., Olympus LS, Tascam DR or Sony PCM) use different, rather uninformative naming schemes (date and number at best), but the relevant information to construct a proper date_time string is embedded in the meta data of the recording (accessible using file.info(), but requires correct settings of the internal clock!). For instance, long recording sessions using an Olympus LS-3 will create multiple files, all of which share the same creation and modification times (with respect to the first recording). By contrast, the Sony PCM-D100 saved files individually (i.e., all have unique ctimes and mtimes). Presets to rename files are available for both types described here.

## Simulate = T allows to see what would happen without altering files
rename_recording(path = "example",
                 format = "WAV",
                 recorder = "Olympus LS-3",
                 simulate = T)

2.) split_wave: Divide long recordings

This function allows to split long audio recordings into smaller chunks for processing with bioacoustics::threshold_detection. To keep the time information, files are written with the corresponding starting time. *The task is performed using a python script queried using reticulate

## split in segments
split_wave(file = "20211220_064253.WAV", # which file
           path = "example", # where to find it
           segment = 30, # cut in 30 sec segments
           downsample = 32000) # resample at 32000

## show files
list.files("example/split/")
## delete folder
unlink("example/split", recursive = TRUE)

3.) find events: Identify signals of interest

This functions is a wrapper to bioacoustics::threshold_detection() aiming at extracting calls based on the signal to noise ratio and some target-specific assumptions about approximate call frequencies and durations. Check ?bioacoustics::threshold_detection() for details. Note, only some of the parameters that are defined in bioacoustics::threshold_detection() are used right know. For long recordings (i.e, several hours) it makes sense to run on segments as created before to avoid memory issues. Here we use the demo sound file as it is

## run detection threshold algorithm
TD <- find_events(wav.file = "example/20211220_064253.WAV",
                  threshold = 8, # Signal-to-noise ratio in db
                  min_dur = 20, # min length in ms
                  max_dur = 300, # max length in ms
                  LPF = 5000, # low-pass filter at 500 Hz
                  HPF = 1000) # high-pass filter at 4 kHz

## Review events 
head(TD$data$event_data[,c("filename", "starting_time", "duration", "freq_max_amp")])

## display spectrogram based on approximate location of first six events
audio <- tuneR::readWave("example/20211220_064253.WAV",
                         from = 46,
                         to = 50,
                         units = "seconds")
bioacoustics::spectro(audio, FFT_size = 2048, flim = c(0, 5000))

In addition to the output shown above, a file with labels for reviewing events in Audacity is created (wrapping seewave::write.audacity()).

knitr::include_graphics("inst/extdata/screenshot_1.PNG")

4.) extract_events: Subset original recording file

Refines the output of find_events by first adding a buffer (default 1 second on both sides of the event) and subsequently merging overlapping selections to make the output more pretty. Additionally, allows to filter based on expected frequencies (i.e., checks maximum amplitude frequency is within the frequency band defined by HPF:LPF)

## extract events based on object TD
df <- extract_events(threshold_detection = TD, 
                     path = "example",
                     format = "WAV",
                     LPF = 4000,
                     HPF = 1000,
                     buffer = 1)

Display refined events ...

## display spectrogram based on first six events
audio <- tuneR::readWave("example/20211220_064253.WAV", 
                         from = df$from,
                         to = df$to,
                         units = "seconds")
bioacoustics::spectro(audio, FFT_size = 2048, flim = c(0, 5000))
knitr::include_graphics("inst/extdata/screenshot_2.PNG")

5.) merge_events: Pool all detected events

Takes the output of the previous operation and concatenates audio signals as well as labels into files called merged.events.wav and merged.events.txt respectively. This option comes handy if there are many input files in the working directory.

merge_events(path = "example")

batch_process: Entire workflow combined in a single function call

Process all files within a directory and run the steps shown above

batch_process(
  path = "example",
  format = "WAV",
  segment = NULL,
  downsample = NULL,
  SNR = 8,
  target = data.frame(min_dur = 20, # min length in ms
                      max_dur = 300, # max length in ms
                      LPF = 5000, # low-pass filter at 500 Hz
                      HPF = 1000),
  rename = FALSE)
my_data <- data.frame(
  Recording = c("60 h", "60 h", "11.91", "10.6 h", "2.73"),
  `Sample rate` = "96000 Hz",
  Downsampled = "441000 Hz",
  Channels = c("Mono", "Mono", "Stereo", "Mono", "Mono"),
  `Run time` = c("2.02 h", "1.76 h", "1.39 h", "1.3 h", "4.88 min")
)

pander::pandoc.table(my_data, style = "rmarkdown", "Run times all steps, notebook ~ Intel i5-4210M, 2 cores ~ 8 GB RAM")

my_data <- data.frame(
  Recording = c("7.5 h"),
  `Sample rate` = "96000 Hz",
  Downsampled = "441000 Hz",
  Channels = "Mono",
  `Run time` = c("14.52 min" )
)

pander::pandoc.table(my_data, style = "rmarkdown", "Run times only event detection, notebook ~ Intel i5-4210M, 2 cores ~ 8 GB RAM")

Update:

With adequate computational power there is no need to split even larger wave files into segments of one hour. This way, the event detection process is much faster (steps 3:6), usually less than four minutes for an entire NocMig night!

my_data <- data.frame(
  Recording = c("114.99 h"),
  `Sample rate` = "48000 Hz",
  Downsampled = "441000 Hz",
  Channels = "Mono",
  `Run time` = c("26.79 min" )
)

pander::pandoc.table(my_data, style = "rmarkdown", "115h AudioMoth recording, notebook ~ AMD RYZEN 7, 16 cores ~ 24 GB RAM")
## cleanup unused files
unlink("example/temp/", recursive = T)
unlink(x = c("example/20211220_064253_extracted.WAV",
             "example/merged_events.WAV",
             "example/merged_events.txt",
             "example/20211220_064253_extracted.txt",
             "example/20211220_064253.txt"))

Header for observation lists on ornitho.de

Retrieve weather data via Bright Sky (de Maeyer 2020) and compose a string describing a NocMig session from dusk to dawn for a given location. Note, the comment follows suggestions by HGON (Schütze et al 2022)

## example for Bielefeld
## -----------------------------------------------------------------------------
NocMig_meta(date = Sys.Date() - 2,
            lat = 52.032,
            lon = 8.517)

Integrating BirdNET-Analyzer in processing routine

Recently I started to play with BirdNET. First trials suggest that only few calls of interest are missed, and the majority is correctly labelled using the BirdNET_GLOBAL_6K_V2.4 model. Currently, it is rather difficult to run BirdNET through RStudio on a windows computer, and hence a few lines of python code are pasted to a Linux (Ubuntu) command line

1.) Run BirdNET-Analyzer

## Creates a species list by subsetting from the full model 
## -----------------------------------------------------------------
BirdNET_species.list(names = c("Glaucidium passerinum", "Bubo bubo"),
                     scientific = T,
                     out = "example/species.txt")

Run analyze.py using a command line program (e.g. Ubuntu on windows). See details in the documentation of Birdnet

```{python, eval = FALSE}

run BirdNET-Analyzer in a bash shell

--------------------

python3 analyze.py --i /exampele --o /exampele --slist /example/species.txt --rtype 'audacity' --threads 1 --locale 'de'

#### 2.) Adjust audacity labels and summarise records

The function `BirdNET` (see ?BirdNET for details) does the following:

(1) Reshape audacity labels created by `analyze.py` to include the event time:
(2) Write records to xlsx file (BirdNET.xlsx) as a template to simplify inspection and verification:

```r
df <- BirdNET(path = "example/")
df[["Records"]]

## records per species and day
df[["Records.dd"]]

## records per species and hour
df[["Records.hh"]]

3.) Extract events for review

Extract detections and export them as wav files. For easier access to verify records files are named as 'Species_Date_Time.WAV' (see below).

## extract events 
BirdNET_extract(path = "example/",
                hyperlink = F) ## If T: create hyperlink as excel formula 
## show files
list.files("example/extracted/Sperlingskauz/")
## clean-up 
unlink("example", recursive = TRUE)


mottensmann/NocMigR documentation built on Oct. 3, 2023, 3:36 a.m.