soundgen: Generate a sound

Description Usage Arguments Value Examples

View source: R/soundgen.R

Description

Generates a bout of one or more syllables with pauses between them. Two basic components are synthesized: the harmonic component (the sum of sine waves with frequencies that are multiples of the fundamental frequency) and the noise component. Both components can be filtered with independently specified formants. Intonation and amplitude contours can be applied both within each syllable and across multiple syllables. Suggested application: synthesis of animal or human non-linguistic vocalizations. For more information, see http://cogsci.se/soundgen.html and the vignette on sound generation.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
soundgen(repeatBout = 1, nSyl = 1, sylLen = 300, pauseLen = 200,
  pitchAnchors = data.frame(time = c(0, 0.1, 0.9, 1), value = c(100, 150, 135,
  100)), pitchAnchorsGlobal = NA, temperature = 0.025,
  tempEffects = list(sylLenDep = 0.02, formDrift = 0.3, formDisp = 0.2,
  pitchDriftDep = 0.5, pitchDriftFreq = 0.125, pitchAnchorsDep = 0.05,
  noiseAnchorsDep = 0.1, amplAnchorsDep = 0.1), maleFemale = 0,
  creakyBreathy = 0, nonlinBalance = 0, nonlinDep = 50, jitterLen = 1,
  jitterDep = 3, vibratoFreq = 5, vibratoDep = 0, shimmerDep = 0,
  attackLen = 50, rolloff = -12, rolloffOct = -12, rolloffKHz = -6,
  rolloffParab = 0, rolloffParabHarm = 3, rolloffLip = 6,
  formants = list(f1 = list(time = 0, freq = 860, amp = 30, width = 120), f2 =
  list(time = 0, freq = 1280, amp = 40, width = 120), f3 = list(time = 0, freq =
  2900, amp = 25, width = 200)), formantDep = 1, formantDepStoch = 30,
  vocalTract = 15.5, subFreq = 100, subDep = 100, shortestEpoch = 300,
  amDep = 0, amFreq = 30, amShape = 0, noiseAnchors = data.frame(time =
  c(0, 300), value = c(-120, -120)), formantsNoise = NA, rolloffNoise = -14,
  mouthAnchors = data.frame(time = c(0, 1), value = c(0.5, 0.5)),
  amplAnchors = NA, amplAnchorsGlobal = NA, samplingRate = 16000,
  windowLength = 50, overlap = 75, addSilence = 100, pitchFloor = 50,
  pitchCeiling = 3500, pitchSamplingRate = 3500, throwaway = -120,
  invalidArgAction = c("adjust", "abort", "ignore")[1], plot = FALSE,
  play = FALSE, savePath = NA, ...)

Arguments

repeatBout

the number of times the whole bout should be repeated

nSyl

the number of syllables in the bout. Intonation, amplitude, and formants contours span multiple syllables, but not multiple bouts (see Details)

sylLen

average duration of each syllable, ms

pauseLen

average duration of pauses between syllables, ms

pitchAnchors

a numeric vector of f0 values in Hz (assuming equal time steps) or a dataframe specifying the time (ms) and value (Hz) of each anchor. These anchors are used to create a smooth contour of fundamental frequency f0 (pitch) within one syllable (see Examples)

pitchAnchorsGlobal

unlike pitchAnchors, these anchors are used to create a smooth contour of average f0 across multiple syllables. The values are in semitones relative to the existing pitch, i.e. 0 = no change

temperature

hyperparameter for regulating the amount of stochasticity in sound generation

tempEffects

a list of scale factors regulating the effect of temperature on particular parameters. To change, specify just those pars that you want to modify, don't rewrite the whole list (defaults are hard-coded). sylLenDep: random variation of the duration of syllables and pauses; formDrift: the amount of random drift of formants; formDisp: irregularity of the dispersion of stochastic formants; pitchDriftDep: amount of slow random drift of f0; pitchDriftFreq: frequency of slow random drift of f0; pitchAnchorsDep, noiseAnchorsDep, amplAnchorsDep: random fluctuations of user-specified pitch / noise / amplitude anchors

maleFemale

hyperparameter for shifting f0 contour, formants, and vocalTract to make the speaker appear more male (-1...0) or more female (0...+1)

creakyBreathy

hyperparameter for a rough adjustment of voice quality from creaky (-1) to breathy (+1)

nonlinBalance

hyperparameter for regulating the (approximate) proportion of sound with different regimes of pitch effects (none / subharmonics only / subharmonics and jitter). 0% = no noise; 100% = the entire sound has jitter + subharmonics. Ignored if temperature = 0

nonlinDep

hyperparameter for regulating the intensity of subharmonics and jitter, 0 to 100% (50% = jitter and subharmonics are as specified, <50% weaker, >50% stronger). Ignored if temperature = 0

jitterLen

duration of stable periods between pitch jumps, ms. Use a low value for harsh noise, a high value for irregular vibrato or shaky voice

jitterDep

cycle-to-cycle random pitch variation, semitones

vibratoFreq

the rate of regular pitch modulation, or vibrato, Hz

vibratoDep

the depth of vibrato, semitones

shimmerDep

random variation in amplitude between individual glottal cycles (0 to 100% of original amplitude of each cycle)

attackLen

duration of fade-in / fade-out at each end of syllables and noise (ms)

rolloff

basic rolloff at a constant rate of rolloff db/octave (exponential decay). See getRolloff for more details

rolloffOct

basic rolloff changes from lower to upper harmonics (regardless of f0) by rolloffOct dB/oct. For example, we can get steeper rolloff in the upper part of the spectrum

rolloffKHz

rolloff changes linearly with f0 by rolloffKHz dB/kHz. For ex., -6 dB/kHz gives a 6 dB steeper basic rolloff as f0 goes up by 1000 Hz

rolloffParab

an optional quadratic term affecting only the first rolloffParabHarm harmonics. The middle harmonic of the first rolloffParabHarm harmonics is amplified or dampened by rolloffParab dB relative to the basic exponential decay.

rolloffParabHarm

the number of harmonics affected by rolloffParab

rolloffLip

the effect of lip radiation on source spectrum, dB/oct (the default of +6 dB/oct produces a high-frequency boost when the mouth is open)

formants

either a character string like "aaui" referring to default presets for speaker "M1" or a list of formant times, frequencies, amplitudes, and bandwidths (see ex. below). formants = NA defaults to schwa. Time stamps for formants and mouthOpening can be specified in ms or an any other arbitarary scale. See getSpectralEnvelope for more details

formantDep

scale factor of formant amplitude (1 = no change relative to amplitudes in formants)

formantDepStoch

the amplitude of additional stochastic formants added above the highest specified formant, dB (only if temperature > 0)

vocalTract

the length of vocal tract, cm. Used for calculating formant dispersion (for adding extra formants) and formant transitions as the mouth opens and closes

subFreq

target frequency of subharmonics, Hz (lower than f0, adjusted dynamically so f0 is always a multiple of subFreq)

subDep

the width of subharmonic band, Hz. Regulates how quickly the strength of subharmonics fades as they move away from harmonics in f0 stack. Low values produce narrow sidebands, high values produce uniformly strong subharmonics

shortestEpoch

minimum duration of each epoch with unchanging subharmonics regime, in ms

amDep

amplitude modulation depth, modulation with amplitude range equal to the dynamic range of the sound

amFreq

amplitude modulation frequency, Hz

amShape

amplitude modulation shape (-1 to +1, defaults to 0)

noiseAnchors

a numeric vector of noise amplitudes (-120 dB = none, 0 dB = as loud as voiced component) or a dataframe specifying the time (ms) and amplitude (dB) of anchors for generating the noise component such as aspiration, hissing, etc

formantsNoise

the same as formants, but for the noise component instead of the harmonic component. If NA (default), the noise component will be filtered through the same formants as the harmonic component, approximating aspiration noise [h]

rolloffNoise

rolloff of noise, dB/octave. It is analogous to rolloff, but while rolloff applies to the harmonic component, rolloffNoise applies to the noise component

mouthAnchors

a numeric vector of mouth opening (0 to 1, 0.5 = neutral, i.e. no modification) or a dataframe specifying the time (ms) and value of mouth opening

amplAnchors

a numeric vector of amplitude envelope (0 to 1) or a dataframe specifying the time (ms) and value of amplitude anchors

amplAnchorsGlobal

a numeric vector of global amplitude envelope spanning multiple syllables or a dataframe specifying the time (ms) and value (0 to 1) of each anchor

samplingRate

sampling frequency, Hz

windowLength

length of FFT window, ms

overlap

FFT window overlap, %

addSilence

silence before and after the bout, ms

pitchFloor, pitchCeiling

lower & upper bounds of f0

pitchSamplingRate

sampling frequency of the pitch contour only, Hz. Low values reduce processing time. A rule of thumb is to set this to the same value as pitchCeiling

throwaway

discard harmonics and noise that are quieter than this number (in dB, defaults to -120) to save computational resources

invalidArgAction

what to do if an argument is invalid or outside the range in permittedValues: 'adjust' = reset to default value, 'abort' = stop execution, 'ignore' = throw a warning and continue (may crash)

plot

if TRUE, plots a spectrogram

play

if TRUE, plays the synthesized sound. In case of errors, try setting another default player for play

savePath

full path for saving the output, e.g. '~/Downloads/temp.wav'. If NA (default), doesn't save anything

...

other plotting parameters passed to spectrogram

Value

Returns the synthesized waveform as a numeric vector.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# NB: GUI for soundgen is available as a Shiny app.
# Type "soundgen_app()" to start it

playback = c(TRUE, FALSE)[2]  # set to TRUE to play back the audio from examples

sound = soundgen(play = playback)
# spectrogram(sound, 16000, osc = TRUE)
# playme(sound)

# Use the in-built collection of presets:
# names(presets)  # speakers
# names(presets$Chimpanzee)  # calls per speaker
s1 = eval(parse(text = presets$Chimpanzee$Scream_conflict))  # screaming chimp
# playme(s1)
s2 = eval(parse(text = presets$F1$Scream_conflict))
# playme(s2)
# unless temperature is 0, the sound is different every time
for (i in 1:3) sound = soundgen(play = playback, temperature = .2)

# Bouts versus syllables. Compare:
sound = soundgen(formants = 'uai', repeatBout = 3, play = playback)
sound = soundgen(formants = 'uai', nSyl = 3, play = playback)

# Intonation contours per syllable and globally:
sound = soundgen(nSyl = 5, sylLen = 200, pauseLen = 140,
  play = playback, pitchAnchors = data.frame(
    time = c(0, 0.65, 1), value = c(977, 1540, 826)),
  pitchAnchorsGlobal = data.frame(time = c(0, .5, 1), value = c(-6, 7, 0)))

# Subharmonics in sidebands (noisy scream)
sound = soundgen (nonlinBalance = 100, subFreq = 75, subDep = 130,
  pitchAnchors = data.frame(
    time = c(0, .3, .9, 1), value = c(1200, 1547, 1487, 1154)),
  sylLen = 800,
  play = playback, plot = TRUE)

# Jitter and mouth opening (bark, dog-like)
sound = soundgen(repeatBout = 2, sylLen = 160, pauseLen = 100,
  nonlinBalance = 100, subFreq = 100, subDep = 60, jitterDep = 1,
  pitchAnchors = data.frame(time = c(0, 0.52, 1), value = c(559, 785, 557)),
  mouthAnchors = data.frame(time = c(0, 0.5, 1), value = c(0, 0.5, 0)),
  vocalTract = 5, play = playback)

tatters/soundgen_beta documentation built on May 14, 2019, 9 a.m.