soundgen: Generate a sound
In tatters/soundgen_beta: Parametric Voice Synthesis

Description Usage Arguments Value Examples

Generates a bout of one or more syllables with pauses between them. Two basic components are synthesized: the harmonic component (the sum of sine waves with frequencies that are multiples of the fundamental frequency) and the noise component. Both components can be filtered with independently specified formants. Intonation and amplitude contours can be applied both within each syllable and across multiple syllables. Suggested application: synthesis of animal or human non-linguistic vocalizations. For more information, see http://cogsci.se/soundgen.html and the vignette on sound generation.

soundgen(repeatBout = 1, nSyl = 1, sylLen = 300, pauseLen = 200,
  pitchAnchors = data.frame(time = c(0, 0.1, 0.9, 1), value = c(100, 150, 135,
  100)), pitchAnchorsGlobal = NA, temperature = 0.025,
  tempEffects = list(sylLenDep = 0.02, formDrift = 0.3, formDisp = 0.2,
  pitchDriftDep = 0.5, pitchDriftFreq = 0.125, pitchAnchorsDep = 0.05,
  noiseAnchorsDep = 0.1, amplAnchorsDep = 0.1), maleFemale = 0,
  creakyBreathy = 0, nonlinBalance = 0, nonlinDep = 50, jitterLen = 1,
  jitterDep = 3, vibratoFreq = 5, vibratoDep = 0, shimmerDep = 0,
  attackLen = 50, rolloff = -12, rolloffOct = -12, rolloffKHz = -6,
  rolloffParab = 0, rolloffParabHarm = 3, rolloffLip = 6,
  formants = list(f1 = list(time = 0, freq = 860, amp = 30, width = 120), f2 =
  list(time = 0, freq = 1280, amp = 40, width = 120), f3 = list(time = 0, freq =
  2900, amp = 25, width = 200)), formantDep = 1, formantDepStoch = 30,
  vocalTract = 15.5, subFreq = 100, subDep = 100, shortestEpoch = 300,
  amDep = 0, amFreq = 30, amShape = 0, noiseAnchors = data.frame(time =
  c(0, 300), value = c(-120, -120)), formantsNoise = NA, rolloffNoise = -14,
  mouthAnchors = data.frame(time = c(0, 1), value = c(0.5, 0.5)),
  amplAnchors = NA, amplAnchorsGlobal = NA, samplingRate = 16000,
  windowLength = 50, overlap = 75, addSilence = 100, pitchFloor = 50,
  pitchCeiling = 3500, pitchSamplingRate = 3500, throwaway = -120,
  invalidArgAction = c("adjust", "abort", "ignore")[1], plot = FALSE,
  play = FALSE, savePath = NA, ...)

`repeatBout`	the number of times the whole bout should be repeated
`nSyl`	the number of syllables in the bout. Intonation, amplitude, and formants contours span multiple syllables, but not multiple bouts (see Details)
`sylLen`	average duration of each syllable, ms
`pauseLen`	average duration of pauses between syllables, ms
`pitchAnchors`	a numeric vector of f0 values in Hz (assuming equal time steps) or a dataframe specifying the time (ms) and value (Hz) of each anchor. These anchors are used to create a smooth contour of fundamental frequency f0 (pitch) within one syllable (see Examples)
`pitchAnchorsGlobal`	unlike `pitchAnchors`, these anchors are used to create a smooth contour of average f0 across multiple syllables. The values are in semitones relative to the existing pitch, i.e. 0 = no change
`temperature`	hyperparameter for regulating the amount of stochasticity in sound generation
`tempEffects`	a list of scale factors regulating the effect of temperature on particular parameters. To change, specify just those pars that you want to modify, don't rewrite the whole list (defaults are hard-coded). `sylLenDep`: random variation of the duration of syllables and pauses; `formDrift`: the amount of random drift of formants; `formDisp`: irregularity of the dispersion of stochastic formants; `pitchDriftDep`: amount of slow random drift of f0; `pitchDriftFreq`: frequency of slow random drift of f0; `pitchAnchorsDep, noiseAnchorsDep, amplAnchorsDep`: random fluctuations of user-specified pitch / noise / amplitude anchors
`maleFemale`	hyperparameter for shifting f0 contour, formants, and vocalTract to make the speaker appear more male (-1...0) or more female (0...+1)
`creakyBreathy`	hyperparameter for a rough adjustment of voice quality from creaky (-1) to breathy (+1)
`nonlinBalance`	hyperparameter for regulating the (approximate) proportion of sound with different regimes of pitch effects (none / subharmonics only / subharmonics and jitter). 0% = no noise; 100% = the entire sound has jitter + subharmonics. Ignored if temperature = 0
`nonlinDep`	hyperparameter for regulating the intensity of subharmonics and jitter, 0 to 100% (50% = jitter and subharmonics are as specified, <50% weaker, >50% stronger). Ignored if temperature = 0
`jitterLen`	duration of stable periods between pitch jumps, ms. Use a low value for harsh noise, a high value for irregular vibrato or shaky voice
`jitterDep`	cycle-to-cycle random pitch variation, semitones
`vibratoFreq`	the rate of regular pitch modulation, or vibrato, Hz
`vibratoDep`	the depth of vibrato, semitones
`shimmerDep`	random variation in amplitude between individual glottal cycles (0 to 100% of original amplitude of each cycle)
`attackLen`	duration of fade-in / fade-out at each end of syllables and noise (ms)
`rolloff`	basic rolloff at a constant rate of `rolloff` db/octave (exponential decay). See `getRolloff` for more details
`rolloffOct`	basic rolloff changes from lower to upper harmonics (regardless of f0) by `rolloffOct` dB/oct. For example, we can get steeper rolloff in the upper part of the spectrum
`rolloffKHz`	rolloff changes linearly with f0 by `rolloffKHz` dB/kHz. For ex., -6 dB/kHz gives a 6 dB steeper basic rolloff as f0 goes up by 1000 Hz
`rolloffParab`	an optional quadratic term affecting only the first `rolloffParabHarm` harmonics. The middle harmonic of the first `rolloffParabHarm` harmonics is amplified or dampened by `rolloffParab` dB relative to the basic exponential decay.
`rolloffParabHarm`	the number of harmonics affected by `rolloffParab`
`rolloffLip`	the effect of lip radiation on source spectrum, dB/oct (the default of +6 dB/oct produces a high-frequency boost when the mouth is open)
`formants`	either a character string like "aaui" referring to default presets for speaker "M1" or a list of formant times, frequencies, amplitudes, and bandwidths (see ex. below). `formants = NA` defaults to schwa. Time stamps for formants and mouthOpening can be specified in ms or an any other arbitarary scale. See `getSpectralEnvelope` for more details
`formantDep`	scale factor of formant amplitude (1 = no change relative to amplitudes in `formants`)
`formantDepStoch`	the amplitude of additional stochastic formants added above the highest specified formant, dB (only if temperature > 0)
`vocalTract`	the length of vocal tract, cm. Used for calculating formant dispersion (for adding extra formants) and formant transitions as the mouth opens and closes
`subFreq`	target frequency of subharmonics, Hz (lower than f0, adjusted dynamically so f0 is always a multiple of subFreq)
`subDep`	the width of subharmonic band, Hz. Regulates how quickly the strength of subharmonics fades as they move away from harmonics in f0 stack. Low values produce narrow sidebands, high values produce uniformly strong subharmonics
`shortestEpoch`	minimum duration of each epoch with unchanging subharmonics regime, in ms
`amDep`	amplitude modulation depth, modulation with amplitude range equal to the dynamic range of the sound
`amFreq`	amplitude modulation frequency, Hz
`amShape`	amplitude modulation shape (-1 to +1, defaults to 0)
`noiseAnchors`	a numeric vector of noise amplitudes (-120 dB = none, 0 dB = as loud as voiced component) or a dataframe specifying the time (ms) and amplitude (dB) of anchors for generating the noise component such as aspiration, hissing, etc
`formantsNoise`	the same as `formants`, but for the noise component instead of the harmonic component. If NA (default), the noise component will be filtered through the same formants as the harmonic component, approximating aspiration noise [h]
`rolloffNoise`	rolloff of noise, dB/octave. It is analogous to `rolloff`, but while `rolloff` applies to the harmonic component, `rolloffNoise` applies to the noise component
`mouthAnchors`	a numeric vector of mouth opening (0 to 1, 0.5 = neutral, i.e. no modification) or a dataframe specifying the time (ms) and value of mouth opening
`amplAnchors`	a numeric vector of amplitude envelope (0 to 1) or a dataframe specifying the time (ms) and value of amplitude anchors
`amplAnchorsGlobal`	a numeric vector of global amplitude envelope spanning multiple syllables or a dataframe specifying the time (ms) and value (0 to 1) of each anchor
`samplingRate`	sampling frequency, Hz
`windowLength`	length of FFT window, ms
`overlap`	FFT window overlap, %
`addSilence`	silence before and after the bout, ms
`pitchFloor, pitchCeiling`	lower & upper bounds of f0
`pitchSamplingRate`	sampling frequency of the pitch contour only, Hz. Low values reduce processing time. A rule of thumb is to set this to the same value as `pitchCeiling`
`throwaway`	discard harmonics and noise that are quieter than this number (in dB, defaults to -120) to save computational resources
`invalidArgAction`	what to do if an argument is invalid or outside the range in `permittedValues`: 'adjust' = reset to default value, 'abort' = stop execution, 'ignore' = throw a warning and continue (may crash)
`plot`	if TRUE, plots a spectrogram
`play`	if TRUE, plays the synthesized sound. In case of errors, try setting another default player for `play`
`savePath`	full path for saving the output, e.g. '~/Downloads/temp.wav'. If NA (default), doesn't save anything
`...`	other plotting parameters passed to `spectrogram`

Returns the synthesized waveform as a numeric vector.

# NB: GUI for soundgen is available as a Shiny app.
# Type "soundgen_app()" to start it

playback = c(TRUE, FALSE)[2]  # set to TRUE to play back the audio from examples

sound = soundgen(play = playback)
# spectrogram(sound, 16000, osc = TRUE)
# playme(sound)

# Use the in-built collection of presets:
# names(presets)  # speakers
# names(presets$Chimpanzee)  # calls per speaker
s1 = eval(parse(text = presets$Chimpanzee$Scream_conflict))  # screaming chimp
# playme(s1)
s2 = eval(parse(text = presets$F1$Scream_conflict))
# playme(s2)
# unless temperature is 0, the sound is different every time
for (i in 1:3) sound = soundgen(play = playback, temperature = .2)

# Bouts versus syllables. Compare:
sound = soundgen(formants = 'uai', repeatBout = 3, play = playback)
sound = soundgen(formants = 'uai', nSyl = 3, play = playback)

# Intonation contours per syllable and globally:
sound = soundgen(nSyl = 5, sylLen = 200, pauseLen = 140,
  play = playback, pitchAnchors = data.frame(
    time = c(0, 0.65, 1), value = c(977, 1540, 826)),
  pitchAnchorsGlobal = data.frame(time = c(0, .5, 1), value = c(-6, 7, 0)))

# Subharmonics in sidebands (noisy scream)
sound = soundgen (nonlinBalance = 100, subFreq = 75, subDep = 130,
  pitchAnchors = data.frame(
    time = c(0, .3, .9, 1), value = c(1200, 1547, 1487, 1154)),
  sylLen = 800,
  play = playback, plot = TRUE)

# Jitter and mouth opening (bark, dog-like)
sound = soundgen(repeatBout = 2, sylLen = 160, pauseLen = 100,
  nonlinBalance = 100, subFreq = 100, subDep = 60, jitterDep = 1,
  pitchAnchors = data.frame(time = c(0, 0.52, 1), value = c(559, 785, 557)),
  mouthAnchors = data.frame(time = c(0, 0.5, 1), value = c(0, 0.5, 0)),
  vocalTract = 5, play = playback)