audSpectrogram: Auditory spectrogram
In soundgen: Sound Synthesis and Acoustic Analysis

audSpectrogram

R Documentation

Auditory spectrogram

Description

Produces an auditory spectrogram by convolving the sound with a bank of bandpass filters. The main difference from STFT is that we don't window the signal and de facto get variable temporal resolution in different frequency channels, as with a wavelet transform. The key settings are filterType, nFilters, and yScale, which determine the type, number, and spacing of the filters, respectively. Gammatone filters were designed as a simple approximation of human perception - see gammatone and Slaney 1993 "An Efficient Implementation of the Patterson–Holdsworth Auditory Filter Bank". Butterworth or Chebyshev filters are not meant to model perception, but can be useful for quickly plotting a sound.

Usage

audSpectrogram(
  x,
  samplingRate = NULL,
  scale = NULL,
  from = NULL,
  to = NULL,
  step = 1,
  dynamicRange = 80,
  filterType = c("butterworth", "chebyshev", "gammatone")[1],
  nFilters = 128,
  nFilters_oct = NULL,
  filterOrder = if (filterType == "gammatone") 4 else 3,
  bandwidth = NULL,
  bandwidthMult = 1,
  minFreq = 20,
  maxFreq = samplingRate/2,
  minBandwidth = 10,
  output = c("audSpec", "audSpec_processed", "filterbank", "filterbank_env", "roughness"),
  reportEvery = NULL,
  cores = 1,
  plot = TRUE,
  savePlots = NULL,
  plotFilters = FALSE,
  osc = c("none", "linear", "dB")[2],
  heights = c(3, 1),
  ylim = NULL,
  yScale = c("bark", "mel", "ERB", "log")[1],
  contrast = 0.2,
  brightness = 0,
  maxPoints = c(1e+05, 5e+05),
  padWithSilence = TRUE,
  colorTheme = c("bw", "seewave", "heat.colors", "...")[1],
  col = NULL,
  extraContour = NULL,
  xlab = NULL,
  ylab = NULL,
  xaxp = NULL,
  mar = c(5.1, 4.1, 4.1, 2),
  main = NULL,
  grid = NULL,
  width = 900,
  height = 500,
  units = "px",
  res = NA,
  ...
)

Arguments

`x`	path to a folder, one or more wav or mp3 files c('file1.wav', 'file2.mp3'), Wave object, numeric vector, or a list of Wave objects or numeric vectors
`samplingRate`	sampling rate of `x` (only needed if `x` is a numeric vector)
`scale`	maximum possible amplitude of input used for normalization of input vector (only needed if `x` is a numeric vector)
`from`, `to`	if NULL (default), analyzes the whole sound, otherwise from...to (s)
`step`	step, ms (determines time resolution of the plot, but not of the returned envelopes per channel). step = NULL means no downsampling at all (ncol of output = length of input audio)
`dynamicRange`	dynamic range, dB. All values more than one dynamicRange under maximum are treated as zero
`filterType`	"butterworth" = Butterworth filter `butter`, "chebyshev" = Chebyshev filter `butter`, "gammatone" = `gammatone`
`nFilters`	the number of filters between `minFreq` and `maxFreq` (determines frequency resolution, while `yScale` determines the location of center frequencies)
`nFilters_oct`	an alternative way to specify frequency resolution: the number of filters per octave
`filterOrder`	filter order (defaults to 4 for gammatones, 3 otherwise)
`bandwidth`	filter bandwidth, octaves. If NULL, defaults to ERB bandwidths as in `gammatone`
`bandwidthMult`	a scaling factor for all bandwidths (1 = no effect)
`minFreq`, `maxFreq`	the range of frequencies to analyze. If the spectrogram looks empty, try increasing minFreq - the lowest filters are prone to returning very large values
`minBandwidth`	minimum filter bandwidth, Hz (otherwise filters may become too narrow when nFilters is high; has no effect if filterType = 'gammatone')
`output`	a list of measures to return. Defaults to everything, but this takes a lot of RAM, so shorten to what's needed if analyzing many files at once
`reportEvery`	when processing multiple inputs, report estimated time left every ... iterations (NULL = default, NA = don't report)
`cores`	number of cores for parallel processing
`plot`	should a spectrogram be plotted? TRUE / FALSE
`savePlots`	full path to the folder in which to save the plots (NULL = don't save, ” = same folder as audio)
`plotFilters`	if TRUE, plots the filters as central frequencies ± bandwidth/2
`osc`	"none" = no oscillogram; "linear" = on the original scale; "dB" = in decibels
`heights`	a vector of length two specifying the relative height of the spectrogram and the oscillogram (including time axes labels)
`ylim`	frequency range to plot, kHz (defaults to 0 to Nyquist frequency). NB: still in kHz, even if yScale = bark, mel, or ERB
`yScale`	determines the location of center frequencies of the filters
`contrast`	a number, recommended range -1 to +1. The spectrogram is raised to the power of `exp(3 * contrast)`. Contrast >0 increases sharpness, <0 decreases sharpness
`brightness`	how much to "lighten" the image (>0 = lighter, <0 = darker)
`maxPoints`	the maximum number of "pixels" in the oscillogram (if any) and spectrogram; good for quickly plotting long audio files; defaults to c(1e5, 5e5); does not affect reassigned spectrograms
`padWithSilence`	if TRUE, pads the sound with just enough silence to resolve the edges properly (only the original region is plotted, so the apparent duration doesn't change)
`colorTheme`	black and white ('bw'), as in seewave package ('seewave'), matlab-type palette ('matlab'), or any palette from `palette` such as 'heat.colors', 'cm.colors', etc
`col`	actual colors, eg rev(rainbow(100)) - see ?hcl.colors for colors in base R (overrides colorTheme)
`extraContour`	a vector of arbitrary length scaled in Hz (regardless of yScale!) that will be plotted over the spectrogram (eg pitch contour); can also be a list with extra graphical parameters such as lwd, col, etc. (see examples)
`xlab`, `ylab`, `main`, `mar`, `xaxp`	graphical parameters for plotting
`grid`	if numeric, adds n = `grid` dotted lines per kHz
`width`, `height`, `units`, `res`	graphical parameters for saving plots passed to `png`
`...`	other graphical parameters

Value

Returns a list for each analyzed file, including:

audSpec: auditory spectrogram with frequencies in rows and time in columns
audSpec_processed: same but rescaled for plotting
filterbank: raw output of the filters
roughness: roughness per channel (as many as nFilters)

Examples

# synthesize a sound with gradually increasing hissing noise
sound = soundgen(sylLen = 200, temperature = 0.001,
  noise = list(time = c(0, 350), value = c(-40, 0)),
  formantsNoise = list(f1 = list(freq = 5000, width = 10000)),
  addSilence = 25)
# playme(sound, samplingRate = 16000)

# auditory spectrogram
as = audSpectrogram(sound, samplingRate = 16000, nFilters = 48)
dim(as$audSpec)

# compare to FFT-based spectrogram with similar time and frequency resolution
fs = spectrogram(sound, samplingRate = 16000, yScale = 'bark',
                 windowLength = 5, step = 1)
dim(fs)

## Not run: 
# add bells and whistles
audSpectrogram(sound, samplingRate = 16000,
  filterType = 'butterworth',
  nFilters = 128,
  yScale = 'ERB',
  bandwidth = 1/6,
  dynamicRange = 150,
  osc = 'dB',  # plot oscillogram in dB
  heights = c(2, 1),  # spectro/osc height ratio
  contrast = .4,  # increase contrast
  brightness = -.2,  # reduce brightness
  # colorTheme = 'heat.colors',  # pick color theme...
  col = hcl.colors(100, palette = 'Plasma'),  # ...or specify the colors
  cex.lab = .75, cex.axis = .75,  # text size and other base graphics pars
  grid = 5,  # to customize, add manually with graphics::grid()
  ylim = c(0.05, 8),  # always in kHz
  main = 'My auditory spectrogram' # title
  # + axis labels, etc
)

# NB: frequency resolution is controlled by both nFilters and bandwidth
audSpectrogram(sound, 16000, nFilters = 15, bandwidth = 1/2)
audSpectrogram(sound, 16000, nFilters = 15, bandwidth = 1/10)
audSpectrogram(sound, 16000, nFilters = 100, bandwidth = 1/2)
audSpectrogram(sound, 16000, nFilters = 100, bandwidth = 1/10)
audSpectrogram(sound, 16000, nFilters_oct = 5, bandwidth = 1/10)

# remove the oscillogram
audSpectrogram(sound, samplingRate = 16000, osc = 'none')

# save auditory spectrograms of all audio files in a folder
audSpectrogram('~/Downloads/temp',
  savePlots = '~/Downloads/temp/audSpec', cores = 4)

## End(Not run)

soundgen documentation built on Aug. 8, 2025, 7:47 p.m.