compareSounds: Compare sounds (experimental)
In tatters/soundgen_beta: Parametric Voice Synthesis

Computes similarity between two sounds based on correlating mel-transformed spectra (auditory spectra). Called by matchPars.

compareSounds(target, targetSpec = NULL, cand, samplingRate = NULL,
  method = c("cor", "cosine", "pixel", "dtw")[1:4], windowLength = 40,
  overlap = 50, step = NULL, padWith = NA, penalizeLengthDif = TRUE,
  throwaway = -120, maxFreq = NULL, summary = TRUE)

`target`	the sound we want to reproduce using soundgen: path to a .wav file or numeric vector
`targetSpec`	if already calculated, the target auditory spectrum can be provided to speed things up
`cand`	the sound to be compared to `target`
`samplingRate`	sampling rate of `target` (only needed if target is a numeric vector, rather than a .wav file)
`method`	method of comparing mel-transformed spectra of two sounds: "cor" = average Pearson's correlation of mel-transformed spectra of individual FFT frames; "cosine" = same as "cor" but with cosine similarity instead of Pearson's correlation; "pixel" = absolute difference between each point in the two spectra; "dtw" = discrete time warp with `dtw`
`windowLength`	length of FFT window, ms
`overlap`	overlap between successive FFT frames, %
`step`	you can override `overlap` by specifying FFT step, ms
`padWith`	compared spectra are padded with either silence (`padWith = 0`) or with NA's (`padWith = NA`) to have the same number of columns. When the sounds are of different duration, padding with zeroes rather than NA's improves the fit to target measured by `method = 'pixel'` and `'dtw'`, but it has no effect on `'cor'` and `'cosine'`.
`penalizeLengthDif`	if TRUE, sounds of different length are considered to be less similar; if FALSE, only the overlapping parts of two sounds are compared
`throwaway`	parts of the spectra quieter than `throwaway` dB are not compared
`maxFreq`	parts of the spectra above `maxFreq` Hz are not compared
`summary`	if TRUE, returns the mean of similarity values calculated by all methods in `method`

target = soundgen(sylLen = 500, formants = 'a',
                  pitchAnchors = data.frame(time = c(0, 0.1, 0.9, 1),
                                            value = c(100, 150, 135, 100)),
                  temperature = 0)
targetSpec = soundgen:::getMelSpec(target, samplingRate = 16000)
parsToTry = list(
  list(formants = 'i',                                            # wrong
       pitchAnchors = data.frame(time = c(0, 1),                  # wrong
                                 value = c(200, 300))),
  list(formants = 'i',                                            # wrong
       pitchAnchors = data.frame(time = c(0, 0.1, 0.9, 1),        # right
                                 value = c(100, 150, 135, 100))),
  list(formants = 'a',                                            # right
       pitchAnchors = data.frame(time = c(0,1),                   # wrong
                                 value = c(200, 300))),
  list(formants = 'a',
       pitchAnchors = data.frame(time = c(0, 0.1, 0.9, 1),        # right
                                 value = c(100, 150, 135, 100)))  # right
)

sounds = list()
for (s in 1:length(parsToTry)) {
  sounds[[length(sounds) + 1]] =  do.call(soundgen,
    c(parsToTry[[s]], list(temperature = 0, sylLen = 500)))
}

method = c('cor', 'cosine', 'pixel', 'dtw')
df = matrix(NA, nrow = length(parsToTry), ncol = length(method))
colnames(df) = method
df = as.data.frame(df)
for (i in 1:nrow(df)) {
  df[i, ] = compareSounds(
    target = NULL,            # can use target instead of targetSpec...
    targetSpec = targetSpec,  # ...but faster to calculate targetSpec once
    cand = sounds[[i]],
    samplingRate = 16000,
    padWith = NA,
    penalizeLengthDif = TRUE,
    method = method,
    summary = FALSE
  )
}
df$av = rowMeans(df, na.rm = TRUE)
df  # row 1 = wrong pitch & formants, ..., row 4 = right pitch & formants