Quantitative description of song structure based on beat spectra

The problem of the structure of songs

The basic component of Orthoptera songs is the pulse. Therefore, most studies focus on pulse (and inter-pulse) features. However, songs of some species are organised in a very hierarchical fashion -- pulses are grouped to form syllable, and syllables make echemes.

Higher level structures (syllable and echemes) may contain some valuable information to define/classify songs, and ought to be characterised.

The first part of this document will provide and example of complex song hierarchy. Afterwards, it will explain how beat spectra can be used to encapsulate such structures. Finally, it will describe how they can be used to classify songs.

An example of elaborate song structure


The following song, from Chorthippus dorsatus (bioacoustica annotation 277) is a good illustration of this hierarchical organasition.

# preparing data
library(bioacoustica)
library(orthophonia)
library(data.table)
library(ggplot2)

DATA_DIR <- "~/Desktop/ortho_data/"

all_annotations = getAllAnnotationData()
query = all_annotations[id == 277]
my_annotations <- dowloadFilesForAnnotations(query,
                                             dst_dir = DATA_DIR,
                                             verbose=T)
#the example signal
wave_expl <- standardiseWave(my_annotations[id==277, annotation_path])
# a data table used only to plot signal
dt_expl <- data.table(x=wave_expl@left, t = 1:length(wave_expl@left)/wave_expl@samp.rate)

The whole song looks like that:

ggplot(dt_expl[seq(from=1, to=.N, by=20)], aes(t, x)) +
    geom_line() +
    annotate("rect", xmin = 0.4, xmax = 1.25,
             ymin = -Inf, ymax = +Inf, alpha = .2,
             colour="blue", fill="blue")+
    labs(title="Segment of Chorthippus dorsatus song",
         x="t (s)")

At this scale, we can see that the song contains high-level structures, echemes (e.g. the blue rectangle), that have a periodicity of about 1s.

If we select this echeme and study its structure:

ggplot(dt_expl[t> 0.4 & t<1.25][seq(from=1, to=.N, by=5)], aes(t, x)) +
  annotate("rect", xmin = 0.65, xmax = 0.69,
             ymin = -Inf, ymax = +Inf, alpha = .2,
             colour="blue", fill="blue")+
    geom_line() +
    labs(title="Segment of Chorthippus dorsatus song",
         x="t (s)")

We can observe that it contains syllables that are separated by around 80ms.

Now, at an even finer resolution,

ggplot(dt_expl[t> 0.65 & t<0.69], aes(t *1e3, x)) +
    geom_line() +
    annotate("rect", xmin = 660.5, xmax = 663.5,
             ymin = -Inf, ymax = +Inf, alpha = .2,
             colour="blue", fill="blue")+
    labs(title="Segment of Chorthippus dorsatus song",
         x="t (ms)")

We see distinct pulses with a periodicity around 4ms.

Altogether, this song has three layers of periodicity. In order to describe such structure, one needs a tool capable of providing good enough frequency (i.e. period) resolution at a large range of scales. In this example, the rhythmicity spans over three orders of magnitudes (i.e. nearly 10 octaves).

The beat spectrum

My work has so far conducted me to devise an algorithm to extract information about the rhythm of songs. In order to remain relatively agnostic to the expected range of period expected, I used continuous wavelet transform as it naturally provides dynamic range (log) resolution of periods. This tool can generate a "beat spectrum", which is a period (or frequency) representation of the rhythmicity the sound envelope.

If we apply it to the previous example, we get:

wave_expl <- autoBandPassFilter(wave_expl)
dt <- beatSpectrum(wave_expl)

arrows <- data.table(xend=c(5,80,1400), yend=c(0.35,0.15,0.4))
arrows$x = arrows$xend * 1.2
arrows$y = arrows$yend + 0.1

ggplot(dt, aes(period*1000, power)) +
  geom_bar(stat="identity", colour="blue", fill="blue") +
  scale_x_log10(breaks=c(10^(0:3), 3*10^(0:3))) +
  labs(title="Beat spectrum of\nChorthippus dorsatus song",
         x="Period (ms)")+
  geom_segment(data=arrows,aes(x=x,xend=xend,y=y,yend=yend),arrow=arrow(length=unit(0.2,"cm")), colour="red")

The red arrows indicate the three peaks corresponding to the three levels of rhythmicity. They are at 5ms, 80ms and 1.4s, from left to right, which matches the results we obtained through graphical inspection.

Classification of beat spectra

The idea is to generalise the use of such tool to classify songs. For instance, we could several pull songs from two species: Chorthippus dorsatus and Chorthippus brunneus, and attempt to cluster them based on their rhythmicity.

DATA_DIR <- "~/Desktop/ortho_data/"
taxa <- c("Chorthippus Glyptobothrus brunneus","Chorthippus Chorthippus dorsatus")
all_annotations = getAllAnnotationData()
query = all_annotations[taxon %in% taxa & author == "qgeissmann"]
my_annotations <- dowloadFilesForAnnotations(query,
                                             dst_dir = DATA_DIR,
                                             verbose=T,
                                             force=F)

# keep annotation shorted than 30s, for speed
query <- query[ end - start < 30 ]

# This is what we would like to do to each annotation:
pipelineFunction <- function(file_name){
  #print(file_name)
  wave <- standardiseWave(file_name)
  #wave <- autoBandPassFilter(wave)
  beatSpectrum(wave)
}


# we use data table `by` to do the job, this can take some time:
dt <- my_annotations[, pipelineFunction(annotation_path), by=id]

print(dt)
dt <- my_annotations[dt]
ggplot(dt, aes(period, power, group=id)) + geom_line(alpha=.3) +
   scale_x_log10(name="period (s)", limits=c(1e-3,5e1)) + facet_wrap(~ taxon, nrow=2 )  

Now, lets perform a hierarchical clustering. Here we use Dynamic Time Warping (DTW) as we want to essentially align time (period) series. For that we can use the dtwclustpackage.

library(dtwclust)
library(ggdendro)
ctrl <- new("dtwclustControl", window.size = 20L, trace = TRUE)
datalist <- split(dt$power, dt$id)
hc.sbd <- dtwclust(datalist, type = "hierarchical",
                   k = 19:21, distance = "sbd",
                   method = "all",
                   control = ctrl, seed=1234)
cvis <- sapply(hc.sbd, cvi, b = names(datalist))

#labels <- paste(unique(dt)$taxon_id, ,sep="_")
result <- hc.sbd[[which.min(cvis["VI", ])]]
result$labels <- unique(dt)$id

ddata <- dendro_data(result, type = "rectangle")

labs <- as.data.table(ddata$labels)
labs$id = labs$label
setkey(labs,id)
tmp_dt <- unique(dt)[,.(id,taxon_id)]
tmp_dt[,id :=as.character(id)]

setkey(tmp_dt,id)

labs <- tmp_dt[labs]


ggplot(segment(ddata)) + 
  geom_segment(aes(x = x, y = y, xend = xend, yend = yend)) + 
  geom_text(data = labs, 
            aes(x = x, y = y, label = label,
                colour=taxon_id), 
                vjust = 0.5, hjust=0) + 
  coord_flip() + scale_y_reverse(expand = c(0.2, 0)) +
  theme_dendro()

Here I have labelled leaves as taxon.id_annotation.id. Taxon ids 77 and 207 are Chorthippus Glyptobothrus brunneus and Chorthippus Chorthippus dorsatus, respectively.

This unsupervised approach performs overall quite well, but we can spot several outliers. Namely, annotations 274 and 244 seem misclassified as dorsatus and annotation 278 as brunneus.

Listening to these individually provides some explanation:

In conclusion, beat spectrum can be used to characterised structure of songs. The misclassification observed in this example results from clear outliers.

There are several ways to improve classification:



BioAcoustica/orthophonia documentation built on May 5, 2019, 3:46 p.m.