ssm: Self-similarity matrix
In soundgen: Sound Synthesis and Acoustic Analysis

View source: R/SSM.R

ssm	R Documentation

Self-similarity matrix

Description

Calculates the self-similarity matrix and novelty vector of a sound.

Usage

ssm(
  x,
  samplingRate = NULL,
  from = NULL,
  to = NULL,
  windowLength = 25,
  step = 5,
  overlap = NULL,
  ssmWin = NULL,
  sparse = FALSE,
  maxFreq = NULL,
  nBands = NULL,
  MFCC = 2:13,
  input = c("mfcc", "melspec", "spectrum")[2],
  norm = FALSE,
  simil = c("cosine", "cor")[1],
  kernelLen = 100,
  kernelSD = 0.5,
  padWith = 0,
  summaryFun = c("mean", "sd"),
  reportEvery = NULL,
  cores = 1,
  plot = TRUE,
  savePlots = NULL,
  main = NULL,
  heights = c(2, 1),
  width = 900,
  height = 500,
  units = "px",
  res = NA,
  specPars = list(levels = seq(0, 1, length = 30), colorTheme = c("bw", "seewave",
    "heat.colors", "...")[2], xlab = "Time, s", ylab = "kHz"),
  ssmPars = list(levels = seq(0, 1, length = 30), colorTheme = c("bw", "seewave",
    "heat.colors", "...")[2], xlab = "Time, s", ylab = "Time, s"),
  noveltyPars = list(type = "b", pch = 16, col = "black", lwd = 3)
)

Arguments

`x`	path to a folder, one or more wav or mp3 files c('file1.wav', 'file2.mp3'), Wave object, numeric vector, or a list of Wave objects or numeric vectors
`samplingRate`	sampling rate of `x` (only needed if `x` is a numeric vector)
`from`, `to`	if NULL (default), analyzes the whole sound, otherwise from...to (s)
`windowLength`	length of FFT window, ms
`step`	you can override `overlap` by specifying FFT step, ms (NB: because digital audio is sampled at discrete time intervals of 1/samplingRate, the actual step and thus the time stamps of STFT frames may be slightly different, eg 24.98866 instead of 25.0 ms)
`overlap`	overlap between successive FFT frames, %
`ssmWin`	window for averaging SSM, ms (has a smoothing effect and speeds up the processing)
`sparse`	if TRUE, the entire SSM is not calculated, but only the central region needed to extract the novelty contour (speeds up the processing)
`maxFreq`	highest band edge of mel filters, Hz. Defaults to `samplingRate / 2`. See `melfcc`
`nBands`	number of warped spectral bands to use. Defaults to `100 * windowLength / 20`. See `melfcc`
`MFCC`	which mel-frequency cepstral coefficients to use; defaults to `2:13`
`input`	the spectral representation used to calculate the SSM
`norm`	if TRUE, the spectrum of each STFT frame is normalized
`simil`	method for comparing frames: "cosine" = cosine similarity, "cor" = Pearson's correlation
`kernelLen`	length of checkerboard kernel for calculating novelty, ms (larger values favor global, slow vs. local, fast novelty)
`kernelSD`	SD of checkerboard kernel for calculating novelty
`padWith`	how to treat edges when calculating novelty: NA = treat sound before and after the recording as unknown, 0 = treat it as silence
`summaryFun`	functions used to summarize each acoustic characteristic, eg "c('mean', 'sd')"; user-defined functions are fine (see examples); NAs are omitted automatically for mean/median/sd/min/max/range/sum, otherwise take care of NAs yourself
`reportEvery`	when processing multiple inputs, report estimated time left every ... iterations (NULL = default, NA = don't report)
`cores`	number of cores for parallel processing
`plot`	if TRUE, plots the SSM
`savePlots`	full path to the folder in which to save the plots (NULL = don't save, ” = same folder as audio)
`main`	plot title
`heights`	relative sizes of the SSM and spectrogram/novelty plot
`width`, `height`, `units`, `res`	graphical parameters for saving plots passed to `png`
`specPars`	graphical parameters passed to `filled.contour.mod` and affecting the `spectrogram`
`ssmPars`	graphical parameters passed to `filled.contour.mod` and affecting the plot of SSM
`noveltyPars`	graphical parameters passed to `lines` and affecting the novelty contour

Value

Returns a list of two components: $ssm contains the self-similarity matrix, and $novelty contains the novelty vector.

References

El Badawy, D., Marmaroli, P., & Lissek, H. (2013). Audio Novelty-Based Segmentation of Music Concerts. In Acoustics 2013 (No. EPFL-CONF-190844)
Foote, J. (1999, October). Visualizing music and audio using self-similarity. In Proceedings of the seventh ACM international conference on Multimedia (Part 1) (pp. 77-80). ACM.
Foote, J. (2000). Automatic audio segmentation using a measure of audio novelty. In Multimedia and Expo, 2000. ICME 2000. 2000 IEEE International Conference on (Vol. 1, pp. 452-455). IEEE.

Examples

sound = c(soundgen(),
          soundgen(nSyl = 4, sylLen = 50, pauseLen = 70,
          formants = NA, pitch = c(500, 330)))
# playme(sound)
# detailed, local features (captures each syllable)
s1 = ssm(sound, samplingRate = 16000, kernelLen = 100,
         sparse = TRUE)  # much faster with 'sparse'
# more global features (captures the transition b/w the two sounds)
s2 = ssm(sound, samplingRate = 16000, kernelLen = 400, sparse = TRUE)

s2$summary
s2$novelty  # novelty contour
## Not run: 
ssm(sound, samplingRate = 16000,
    input = 'mfcc', simil = 'cor', norm = TRUE,
    ssmWin = 25,  # speed up the processing
    kernelLen = 300,  # global features
    specPars = list(colorTheme = 'seewave'),
    ssmPars = list(col = rainbow(100)),
    noveltyPars = list(type = 'l', lty = 3, lwd = 2))

## End(Not run)

soundgen documentation built on April 4, 2025, 3:44 a.m.