segment | R Documentation |
Finds syllables and bursts separated by background noise in long recordings (up to 1-2 hours of audio per file). Syllables are defined as continuous segments that seem to be different from noise based on amplitude and/or spectral similarity thresholds. Bursts are defined as local maxima in signal envelope that are high enough both in absolute terms (relative to the global maximum) and with respect to the surrounding region (relative to local mimima). See vignette('acoustic_analysis', package = 'soundgen') for details.
segment(
x,
samplingRate = NULL,
from = NULL,
to = NULL,
shortestSyl = 40,
shortestPause = 40,
method = c("env", "spec", "mel")[3],
propNoise = NULL,
SNR = NULL,
noiseLevelStabWeight = c(1, 0.25),
windowLength = 40,
step = NULL,
overlap = 80,
reverbPars = list(reverbDelay = 70, reverbSpread = 130, reverbLevel = -35,
reverbDensity = 50),
interburst = NULL,
peakToTrough = SNR + 3,
troughLocation = c("left", "right", "both", "either")[4],
summaryFun = c("median", "sd"),
maxDur = 30,
reportEvery = NULL,
cores = 1,
plot = FALSE,
savePlots = NULL,
saveAudio = NULL,
addSilence = 50,
main = NULL,
xlab = "",
ylab = "Signal, dB",
showLegend = FALSE,
width = 900,
height = 500,
units = "px",
res = NA,
maxPoints = c(1e+05, 5e+05),
specPlot = list(colorTheme = "bw"),
contourPlot = list(lty = 1, lwd = 2, col = "green"),
sylPlot = list(lty = 1, lwd = 2, col = "blue"),
burstPlot = list(pch = 8, cex = 3, col = "red"),
...
)
x |
path to a folder, one or more wav or mp3 files c('file1.wav', 'file2.mp3'), Wave object, numeric vector, or a list of Wave objects or numeric vectors |
samplingRate |
sampling rate of |
from , to |
if NULL (default), analyzes the whole sound, otherwise from...to (s) |
shortestSyl |
minimum acceptable length of syllables, ms |
shortestPause |
minimum acceptable break between syllables, ms (syllables separated by shorter pauses are merged) |
method |
the signal used to search for syllables: 'env' = Hilbert-transformed amplitude envelope, 'spec' = spectrogram, 'mel' = mel-transformed spectrogram (see tuneR::melfcc) |
propNoise |
the proportion of non-zero sound assumed to represent background noise, 0 to 1 (note that complete silence is not considered, so padding with silence won't affect the algorithm) |
SNR |
expected signal-to-noise ratio (dB above noise), which determines the threshold for syllable detection. The meaning of "dB" here is approximate since the "signal" may be different from sound intensity |
noiseLevelStabWeight |
a vector of length 2 specifying the relative weights of the overall signal level vs. stability when attempting to automatically locate the regions that represent noise. Increasing the weight of stability tends to accentuate the beginning and end of each syllable. |
windowLength |
length of FFT window, ms |
step |
you can override |
overlap |
overlap between successive FFT frames, % |
reverbPars |
parameters passed on to |
interburst |
minimum time between two consecutive bursts (ms). Defaults
to the average detected |
peakToTrough |
to qualify as a burst, a local maximum has to be at least
|
troughLocation |
should local maxima be compared to the trough on the left and/or right of it? Values: 'left', 'right', 'both', 'either' |
summaryFun |
functions used to summarize each acoustic characteristic;
see |
maxDur |
long files are split into chunks |
reportEvery |
when processing multiple inputs, report estimated time left every ... iterations (NULL = default, NA = don't report) |
cores |
number of cores for parallel processing |
plot |
if TRUE, produces a segmentation plot |
savePlots |
full path to the folder in which to save the plots (NULL = don't save, ” = same folder as audio) |
saveAudio |
full path to the folder in which to save audio files (one per detected syllable) |
addSilence |
if syllables are saved as separate audio files, they can be padded with some silence (ms) |
xlab , ylab , main |
main plotting parameters |
showLegend |
if TRUE, shows a legend for thresholds |
width , height , units , res |
parameters passed to
|
maxPoints |
the maximum number of "pixels" in the oscillogram (if any) and spectrogram; good for quickly plotting long audio files; defaults to c(1e5, 5e5) |
specPlot |
a list of graphical parameters for displaying the spectrogram
(if |
contourPlot |
a list of graphical parameters for displaying the signal contour used to detect syllables (see details) |
sylPlot |
a list of graphical parameters for displaying the syllables |
burstPlot |
a list of graphical parameters for displaying the bursts |
... |
other graphical parameters passed to graphics::plot |
Algorithm: for each chunk at most maxDur
long, first the audio
recording is partitioned into signal and noise regions: the quietest and most
stable regions are located, and noise threshold is defined from a
user-specified proportion of noise in the recording (propNoise
) or, if
propNoise = NULL
, from the lowest local maximum in the density
function of a weighted product of amplitude and stability (that is, we assume
that quiet and stable regions are likely to represent noise). Once we know
what the noise looks like - in terms of its typical amplitude and/or spectrum
- we derive signal contour as its difference from noise at each time point.
If method = 'env'
, this is Hilbert transform minus noise, and if
method = 'spec' or 'mel'
, this is the inverse of cosine similarity
between the spectrum of each frame and the estimated spectrum of noise
weighted by amplitude. By default, signal-to-noise ratio (SNR) is estimated
as half-median of above-noise signal, but it is recommended that this
parameter is adjusted by hand to suit the purposes of segmentation, as it is
the key setting that controls the balance between false negatives (missing
faint signals) and false positives (hallucinating signals that are actually
noise). Note also that effects of echo or reverberation can be taken into
account: syllable detection threshold may be raised following powerful
acoustic bursts with the help of the reverbPars
argument. At the final
stage, continuous "islands" SNR dB above noise level are detected as
syllables, and "peaks" on the islands are detected as bursts. The algorithm
is very flexible, but the parameters may be hard to optimize by hand. If you
have an annotated sample of the sort of audio you are planning to analyze,
with syllables and/or bursts counted manually, you can use it for automatic
optimization of control parameters (see optimizePars
).
If summaryFun = NULL
, returns returns a list containing full
stats on each syllable and burst (one row per syllable and per burst),
otherwise returns only a dataframe with one row per file - a summary of the
number and spacing of syllables and vocal bursts.
analyze
ssm
sound = soundgen(nSyl = 4, sylLen = 100, pauseLen = 70,
attackLen = 20, amplGlobal = c(0, -20),
pitch = c(368, 284), temperature = .001)
# add noise so SNR decreases from 20 to 0 dB from syl1 to syl4
sound = sound + runif(length(sound), -10 ^ (-20 / 20), 10 ^ (-20 / 20))
# osc(sound, samplingRate = 16000, dB = TRUE)
# spectrogram(sound, samplingRate = 16000, osc = TRUE)
# playme(sound, samplingRate = 16000)
s = segment(sound, samplingRate = 16000, plot = TRUE)
s
# customizing the plot
segment(sound, samplingRate = 16000, plot = TRUE,
sylPlot = list(lty = 2, col = 'gray20'),
burstPlot = list(pch = 16, col = 'blue'),
specPlot = list(col = rev(heat.colors(50))),
xlab = 'Some custom label', cex.lab = 1.2,
showLegend = TRUE,
main = 'My awesome plot')
## Not run:
# set SNR manually to control detection threshold
s = segment(sound, samplingRate = 16000, SNR = 1, plot = TRUE)
# Download 260 sounds from the supplements to Anikin & Persson (2017) at
# http://cogsci.se/publications.html
# unzip them into a folder, say '~/Downloads/temp'
myfolder = '~/Downloads/temp260' # 260 .wav files live here
s = segment(myfolder, propNoise = .05, SNR = 3)
# Check accuracy: import a manual count of syllables (our "key")
key = segmentManual # a vector of 260 integers
trial = as.numeric(s$summary$nBursts)
cor(key, trial, use = 'pairwise.complete.obs')
boxplot(trial ~ as.integer(key), xlab='key')
abline(a=0, b=1, col='red')
# or look at the detected syllables instead of bursts:
cor(key, s$summary$nSyl, use = 'pairwise.complete.obs')
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.