audSpectrogram: Auditory spectrogram

View source: R/audSpec.R

audSpectrogramR Documentation

Auditory spectrogram


Produces an auditory spectrogram by extracting a bank of bandpass filters (work in progress). While tuneR::audspec is based on FFT, here we convolve the sound with a bank of filters. The main difference is that we don't window the signal and de facto get variable temporal resolution in different frequency channels, as with a wavelet transform. The filters are currently third-order Butterworth bandpass filters implemented in butter.


  samplingRate = NULL,
  scale = NULL,
  from = NULL,
  to = NULL,
  step = 1,
  dynamicRange = 80,
  nFilters = 128,
  minFreq = 20,
  maxFreq = samplingRate/2,
  minBandwidth = 10,
  reportEvery = NULL,
  cores = 1,
  plot = TRUE,
  savePlots = NULL,
  osc = c("none", "linear", "dB")[2],
  heights = c(3, 1),
  ylim = NULL,
  yScale = c("bark", "mel", "ERB", "log")[1],
  contrast = 0.2,
  brightness = 0,
  maxPoints = c(1e+05, 5e+05),
  padWithSilence = TRUE,
  colorTheme = c("bw", "seewave", "heat.colors", "...")[1],
  col = NULL,
  extraContour = NULL,
  xlab = NULL,
  ylab = NULL,
  xaxp = NULL,
  mar = c(5.1, 4.1, 4.1, 2),
  main = NULL,
  grid = NULL,
  width = 900,
  height = 500,
  units = "px",
  res = NA,



path to a folder, one or more wav or mp3 files c('file1.wav', 'file2.mp3'), Wave object, numeric vector, or a list of Wave objects or numeric vectors


sampling rate of x (only needed if x is a numeric vector)


maximum possible amplitude of input used for normalization of input vector (only needed if x is a numeric vector)

from, to

if NULL (default), analyzes the whole sound, otherwise (s)


step, ms (determines time resolution). step = NULL means no downsampling at all (ncol of output = length of input audio)


dynamic range, dB. All values more than one dynamicRange under maximum are treated as zero


the number of filters (determines frequency resolution)

minFreq, maxFreq

the range of frequencies to analyze


minimum filter bandwidth, Hz (otherwise filters may become too narrow when nFilters is high)


when processing multiple inputs, report estimated time left every ... iterations (NULL = default, NA = don't report)


number of cores for parallel processing


should a spectrogram be plotted? TRUE / FALSE


full path to the folder in which to save the plots (NULL = don't save, ” = same folder as audio)


"none" = no oscillogram; "linear" = on the original scale; "dB" = in decibels


a vector of length two specifying the relative height of the spectrogram and the oscillogram (including time axes labels)


frequency range to plot, kHz (defaults to 0 to Nyquist frequency). NB: still in kHz, even if yScale = bark, mel, or ERB


scale of the frequency axis: 'linear' = linear, 'log' = logarithmic (musical), 'bark' = bark with hz2bark, 'mel' = mel with hz2mel, 'ERB' = Equivalent Rectangular Bandwidths with HzToERB


spectrum is exponentiated by contrast (any real number, recommended -1 to +1). Contrast >0 increases sharpness, <0 decreases sharpness


how much to "lighten" the image (>0 = lighter, <0 = darker)


the maximum number of "pixels" in the oscillogram (if any) and spectrogram; good for quickly plotting long audio files; defaults to c(1e5, 5e5)


if TRUE, pads the sound with just enough silence to resolve the edges properly (only the original region is plotted, so the apparent duration doesn't change)


black and white ('bw'), as in seewave package ('seewave'), or any palette from palette such as 'heat.colors', 'cm.colors', etc


actual colors, eg rev(rainbow(100)) - see ?hcl.colors for colors in base R (overrides colorTheme)


a vector of arbitrary length scaled in Hz (regardless of yScale!) that will be plotted over the spectrogram (eg pitch contour); can also be a list with extra graphical parameters such as lwd, col, etc. (see examples)

xlab, ylab, main, mar, xaxp

graphical parameters for plotting


if numeric, adds n = grid dotted lines per kHz

width, height, units, res

graphical parameters for saving plots passed to png


other graphical parameters


# synthesize a sound with gradually increasing hissing noise
sound = soundgen(sylLen = 200, temperature = 0.001,
  noise = list(time = c(0, 350), value = c(-40, 0)),
  formantsNoise = list(f1 = list(freq = 5000, width = 10000)),
  addSilence = 25)
# playme(sound, samplingRate = 16000)

# auditory spectrogram
as = audSpectrogram(sound, samplingRate = 16000, nFilters = 48)

# compare to FFT-based spectrogram with similar time and frequency resolution
fs = spectrogram(sound, samplingRate = 16000, yScale = 'bark',
                 windowLength = 5, step = 1)

## Not run: 
# add bells and whistles
audSpectrogram(sound, samplingRate = 16000,
  yScale = 'ERB',
  osc = 'dB',  # plot oscillogram in dB
  heights = c(2, 1),  # spectro/osc height ratio
  brightness = -.1,  # reduce brightness
  # colorTheme = 'heat.colors',  # pick color theme...
  col = hcl.colors(30, palette = 'Plasma'),  # ...or specify the colors
  cex.lab = .75, cex.axis = .75,  # text size and other base graphics pars
  grid = 5,  # to customize, add manually with graphics::grid()
  ylim = c(0.1, 5),  # always in kHz
  main = 'My auditory spectrogram' # title
  # + axis labels, etc

# change dynamic range
audSpectrogram(sound, samplingRate = 16000, dynamicRange = 40)
audSpectrogram(sound, samplingRate = 16000, dynamicRange = 120)

# remove the oscillogram
audSpectrogram(sound, samplingRate = 16000, osc = 'none')

# save auditory spectrograms of all audio files in a folder
  savePlots = '~/Downloads/temp/audSpec', cores = 4)

## End(Not run)

soundgen documentation built on Sept. 29, 2023, 5:09 p.m.