ssm: Self-similarity matrix

View source: R/SSM.R

ssmR Documentation

Self-similarity matrix

Description

Calculates the self-similarity matrix and novelty vector of a sound.

Usage

ssm(
  x,
  samplingRate = NULL,
  from = NULL,
  to = NULL,
  windowLength = 25,
  step = 5,
  overlap = NULL,
  ssmWin = NULL,
  sparse = FALSE,
  maxFreq = NULL,
  nBands = NULL,
  MFCC = 2:13,
  input = c("mfcc", "melspec", "spectrum")[2],
  norm = FALSE,
  simil = c("cosine", "cor")[1],
  kernelLen = 100,
  kernelSD = 0.5,
  padWith = 0,
  summaryFun = c("mean", "sd"),
  reportEvery = NULL,
  cores = 1,
  plot = TRUE,
  savePlots = NULL,
  main = NULL,
  heights = c(2, 1),
  width = 900,
  height = 500,
  units = "px",
  res = NA,
  specPars = list(levels = seq(0, 1, length = 30), colorTheme = c("bw", "seewave",
    "heat.colors", "...")[2], xlab = "Time, s", ylab = "kHz"),
  ssmPars = list(levels = seq(0, 1, length = 30), colorTheme = c("bw", "seewave",
    "heat.colors", "...")[2], xlab = "Time, s", ylab = "Time, s"),
  noveltyPars = list(type = "b", pch = 16, col = "black", lwd = 3)
)

Arguments

x

path to a folder, one or more wav or mp3 files c('file1.wav', 'file2.mp3'), Wave object, numeric vector, or a list of Wave objects or numeric vectors

samplingRate

sampling rate of x (only needed if x is a numeric vector)

from, to

if NULL (default), analyzes the whole sound, otherwise from...to (s)

windowLength

length of FFT window, ms

step

you can override overlap by specifying FFT step, ms (NB: because digital audio is sampled at discrete time intervals of 1/samplingRate, the actual step and thus the time stamps of STFT frames may be slightly different, eg 24.98866 instead of 25.0 ms)

overlap

overlap between successive FFT frames, %

ssmWin

window for averaging SSM, ms (has a smoothing effect and speeds up the processing)

sparse

if TRUE, the entire SSM is not calculated, but only the central region needed to extract the novelty contour (speeds up the processing)

maxFreq

highest band edge of mel filters, Hz. Defaults to samplingRate / 2. See melfcc

nBands

number of warped spectral bands to use. Defaults to 100 * windowLength / 20. See melfcc

MFCC

which mel-frequency cepstral coefficients to use; defaults to 2:13

input

the spectral representation used to calculate the SSM

norm

if TRUE, the spectrum of each STFT frame is normalized

simil

method for comparing frames: "cosine" = cosine similarity, "cor" = Pearson's correlation

kernelLen

length of checkerboard kernel for calculating novelty, ms (larger values favor global, slow vs. local, fast novelty)

kernelSD

SD of checkerboard kernel for calculating novelty

padWith

how to treat edges when calculating novelty: NA = treat sound before and after the recording as unknown, 0 = treat it as silence

summaryFun

functions used to summarize each acoustic characteristic, eg "c('mean', 'sd')"; user-defined functions are fine (see examples); NAs are omitted automatically for mean/median/sd/min/max/range/sum, otherwise take care of NAs yourself

reportEvery

when processing multiple inputs, report estimated time left every ... iterations (NULL = default, NA = don't report)

cores

number of cores for parallel processing

plot

if TRUE, plots the SSM

savePlots

full path to the folder in which to save the plots (NULL = don't save, ” = same folder as audio)

main

plot title

heights

relative sizes of the SSM and spectrogram/novelty plot

width, height, units, res

graphical parameters for saving plots passed to png

specPars

graphical parameters passed to filled.contour.mod and affecting the spectrogram

ssmPars

graphical parameters passed to filled.contour.mod and affecting the plot of SSM

noveltyPars

graphical parameters passed to lines and affecting the novelty contour

Value

Returns a list of two components: $ssm contains the self-similarity matrix, and $novelty contains the novelty vector.

References

  • El Badawy, D., Marmaroli, P., & Lissek, H. (2013). Audio Novelty-Based Segmentation of Music Concerts. In Acoustics 2013 (No. EPFL-CONF-190844)

  • Foote, J. (1999, October). Visualizing music and audio using self-similarity. In Proceedings of the seventh ACM international conference on Multimedia (Part 1) (pp. 77-80). ACM.

  • Foote, J. (2000). Automatic audio segmentation using a measure of audio novelty. In Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on (Vol. 1, pp. 452-455). IEEE.

See Also

spectrogram modulationSpectrum segment

Examples

sound = c(soundgen(),
          soundgen(nSyl = 4, sylLen = 50, pauseLen = 70,
          formants = NA, pitch = c(500, 330)))
# playme(sound)
# detailed, local features (captures each syllable)
s1 = ssm(sound, samplingRate = 16000, kernelLen = 100,
         sparse = TRUE)  # much faster with 'sparse'
# more global features (captures the transition b/w the two sounds)
s2 = ssm(sound, samplingRate = 16000, kernelLen = 400, sparse = TRUE)

s2$summary
s2$novelty  # novelty contour
## Not run: 
ssm(sound, samplingRate = 16000,
    input = 'mfcc', simil = 'cor', norm = TRUE,
    ssmWin = 25,  # speed up the processing
    kernelLen = 300,  # global features
    specPars = list(colorTheme = 'heat.colors'),
    ssmPars = list(colorTheme = 'bw'),
    noveltyPars = list(type = 'l', lty = 3, lwd = 2))

## End(Not run)

soundgen documentation built on Aug. 14, 2022, 5:05 p.m.