getSpectralEnvelope: Spectral envelope
In soundgen: Sound Synthesis and Acoustic Analysis

getSpectralEnvelope

R Documentation

Spectral envelope

Description

Prepares a spectral envelope for filtering a sound to add formants, lip radiation, and some stochastic component regulated by temperature. Formants are specified as a list containing time, frequency, amplitude, and width values for each formant (see examples). See vignette('sound_generation', package = 'soundgen') for more information.

Usage

getSpectralEnvelope(
  nr,
  nc,
  formants = NA,
  formantDep = 1,
  formantWidth = 1,
  lipRad = 6,
  noseRad = 4,
  mouth = NA,
  mouthOpenThres = 0.2,
  openMouthBoost = 0,
  vocalTract = NULL,
  temperature = 0.05,
  formDrift = 0.3,
  formDisp = 0.2,
  formantDepStoch = 1,
  smoothLinearFactor = 1,
  formantCeiling = 2,
  samplingRate = 16000,
  speedSound = 35400,
  smoothing = list(),
  output = c("simple", "detailed")[1],
  plot = FALSE,
  duration = NULL,
  colorTheme = c("bw", "seewave", "...")[1],
  col = NULL,
  xlab = "Time",
  ylab = "Frequency, kHz",
  ...
)

Arguments

`nr`	the number of frequency bins = windowLength_points/2, where windowLength_points is the size of window for Fourier transform
`nc`	the number of time steps for Fourier transform
`formants`	a character string like "aaui" referring to default presets for speaker "M1"; a vector of formant frequencies; or a list of formant times, frequencies, amplitudes, and bandwidths, with a single value of each for static or multiple values of each for moving formants. `formants = NA` defaults to schwa. Time stamps for formants and mouthOpening can be specified in ms or an any other arbitrary scale.
`formantDep`	scale factor of formant amplitude (1 = no change relative to amplitudes in `formants`)
`formantWidth`	scale factor of formant bandwidth (1 = no change)
`lipRad`	the effect of lip radiation on source spectrum, dB/oct (the default of +6 dB/oct produces a high-frequency boost when the mouth is open)
`noseRad`	the effect of radiation through the nose on source spectrum, dB/oct (the alternative to `lipRad` when the mouth is closed)
`mouth`	mouth opening (0 to 1, 0.5 = neutral, i.e. no modification) (anchor format)
`mouthOpenThres`	open the lips (switch from nose radiation to lip radiation) when the mouth is open `>mouthOpenThres`, 0 to 1
`openMouthBoost`	amplify the voice when the mouth is open by `openMouthBoost` dB
`vocalTract`	the length of vocal tract, cm. Used for calculating formant dispersion (for adding extra formants) and formant transitions as the mouth opens and closes. If `NULL` or `NA`, the length is estimated based on specified formant frequencies, if any (anchor format)
`temperature`	hyperparameter for regulating the amount of stochasticity in sound generation
`formDrift`	scale factor regulating the effect of temperature on the depth of random drift of all formants (user-defined and stochastic): the higher, the more formants drift at a given temperature
`formDisp`	scale factor regulating the effect of temperature on the irregularity of the dispersion of stochastic formants: the higher, the more unevenly stochastic formants are spaced at a given temperature
`formantDepStoch`	multiplication factor for the amplitude of additional formants added above the highest specified formant (0 = none, 1 = default)
`smoothLinearFactor`	regulates smoothing of formant anchors (0 to +Inf) as they are upsampled to the number of fft steps `nc`. This is necessary because the input `formants` normally contains fewer sets of formant values than the number of fft steps. `smoothLinearFactor` = 0: close to default spline; >3: approaches linear extrapolation
`formantCeiling`	frequency to which stochastic formants are calculated, in multiples of the Nyquist frequency; increase up to ~10 for long vocal tracts to avoid losing energy in the upper part of the spectrum
`samplingRate`	sampling frequency, Hz
`speedSound`	speed of sound in warm air, cm/s. Stevens (2000) "Acoustic phonetics", p. 138
`smoothing`	a list of parameters passed to `getSmoothContour` to control the interpolation and smoothing of contours: interpol (approx / spline / loess), loessSpan, discontThres, jumpThres
`output`	"simple" returns just the spectral filter, while "detailed" also returns a data.frame of formant frequencies over time (needed for internal purposes such as formant locking)
`plot`	if TRUE, produces a plot of the spectral envelope
`duration`	duration of the sound, ms (for plotting purposes only)
`colorTheme`	black and white ('bw'), as in seewave package ('seewave'), or another color theme (e.g. 'heat.colors')
`col`	actual colors, eg rev(rainbow(100)) - see ?hcl.colors for colors in base R (overrides colorTheme)
`xlab`, `ylab`	labels of axes
`...`	other graphical parameters passed on to `image()`

Value

Returns a spectral filter: a matrix with frequency bins in rows and time steps in columns. Accordingly, rownames of the output give central frequency of each bin (in kHz), while colnames give time stamps (in ms if duration is specified, otherwise 0 to 1).

Examples

# [a] with only F1-F3 visible, with no stochasticity
e = getSpectralEnvelope(nr = 512, nc = 50, duration = 300,
  formants = soundgen:::convertStringToFormants('a'),
  temperature = 0, plot = TRUE, col = heat.colors(150))
# image(t(e))  # to plot the output on a linear scale instead of dB

# some "wiggling" of specified formants plus extra formants on top
e = getSpectralEnvelope(nr = 512, nc = 50,
  formants = c(860, 1430, 2900),
  temperature = 0.1, formantDepStoch = 1, plot = TRUE)

# a schwa based on variable length of vocal tract
e = getSpectralEnvelope(nr = 512, nc = 100, formants = NA,
  vocalTract = list(time = c(0, .4, 1), value = c(13, 18, 17)),
  temperature = .1, plot = TRUE)

# no formants at all, only lip radiation
e = getSpectralEnvelope(nr = 512, nc = 50, lipRad = 6,
  formants = NA, temperature = 0, plot = FALSE)
plot(e[, 1], type = 'l')              # linear scale
plot(20 * log10(e[, 1]), type = 'l')  # dB scale - 6 dB/oct

# mouth opening
e = getSpectralEnvelope(nr = 512, nc = 50,
  vocalTract = 16, plot = TRUE, lipRad = 6, noseRad = 4,
  mouth = data.frame(time = c(0, .5, 1), value = c(0, 0, .5)))

# scale formant amplitude and/or bandwidth
e1 = getSpectralEnvelope(nr = 512, nc = 50,
  formants = soundgen:::convertStringToFormants('a'),
  formantWidth = 1, formantDep = 1)  # defaults
e2 = getSpectralEnvelope(nr = 512, nc = 50,
  formants = soundgen:::convertStringToFormants('a'),
  formantWidth = 1.5, formantDep = 1.5)
plot(as.numeric(rownames(e2)), 20 * log10(e2[, 1]),
     type = 'l', xlab = 'KHz', ylab = 'dB', col = 'red', lty = 2)
points(as.numeric(rownames(e1)), 20 * log10(e1[, 1]), type = 'l')

# manual specification of formants
e3 = getSpectralEnvelope(
  nr = 512, nc = 50, samplingRate = 16000, plot = TRUE,
  formants = list(
    f1 = list(freq = c(900, 500), amp = c(30, 35), width = c(80, 50)),
    f2 = list(freq = c(1900, 2500), amp = c(25, 30), width = 100),
    f3 = list(freq = 3400, amp = 30, width = 120)
))

# extra zero-pole pair (doesn't affect estimated VTL and thus the extra
# formants added on top)
e4 = getSpectralEnvelope(
  nr = 512, nc = 50, samplingRate = 16000, plot = TRUE,
  formants = list(
    f1 = list(freq = c(900, 500), amp = c(30, 35), width = c(80, 50)),
    f1.5 = list(freq = 1300, amp = -15),
    f1.7 = list(freq = 1500, amp = 15),
    f2 = list(freq = c(1900, 2500), amp = c(25, 30), width = 100),
    f3 = list(freq = 3400, amp = 30, width = 120)
))
plot(as.numeric(rownames(e4)), 20 * log10(e3[, ncol(e3)]),
     type = 'l', xlab = 'KHz', ylab = 'dB')
points(as.numeric(rownames(e4)), 20 * log10(e4[, ncol(e4)]),
       type = 'l', col = 'red', lty = 2)

soundgen documentation built on Dec. 1, 2025, 9:08 a.m.