In dankelley/oce: Analysis of Oceanographic Data

Abstract. This is a diary of Dan Kelley's work, trying to add support for the 0x23 (raw echosounder) AD2CP data type to the oce R package. To make things concrete, I am showing my (R) code; even somebody not familiar with R ought to get the gist.

knitr::opts_chunk$set(dev="png", dev.args=list(type="cairo-png"))

Diary of updates

Update 2022-09-01

Ragnar supplied a formula for sampleStartIndex, as well as a very helpful diagram explaining XMIT. From these things I was able to infer a formula for echosounderRaw$cellSize; this formula yields the graphs provided near the end of the present document.

Update 2022-08-29

I've learned that what I call blankingDistanceInCm is incorrectly stored in AD2CP files. In fact, it is always FALSE (i.e. the factor is 1e-3, not 1e-2). This is detected in read.adp.ad2cp() now, and a warning is issued if the value stored in the file does not yield FALSE. With this scheme, the warning will not appear when Nortek updates their systems and folks generate files with the newer system.

2022-08-22

I am basing this on the contents of a Nortek email dated 22 Aug 2022 21:33:03.7921 (UTC). I do not know how private that was, so I will not quote from it in this document, nor name the sender.

The email told me that the data that oce calls echosounderRaw come in pairs that represent complex numbers, with a frac32 value for the real part and then a frac32 value for the imaginary part. At first I though this meant a format mentioned at https://code.ihub.org.cn/projects/729/repository/revisions/master/entry/modules/core/src/softfloat.cpp but after some email exchanges, I found that it referred to a scaled signed integer. The recipe is simple: read as 32-bit signed integer, convert to numeric, then divide by 2^31.

Prior to 2022-08-22

A document I received from Nortek in August of 2022 led me to believe that the data were stored as pairs of float32 entries^[Actually, the documentation just says float but the entries are 8 bytes long, so I am assuming float32.], with the first entry of each pair being echosounder amplitude, and the second being echosounder quality.

However, this assumption yields some very odd results (lots of NA values, lots of values that make no sense, like exponents in the 20s for quality). I communicated again, and was very pleased to get a quick and detailed reply suggesting another format (see previous).

Latest results

A test file (from which tests/testthat/local-data/ad2cp/ad2cp-01.ad2cp is extracted) produces the following results. Note that this uses the startSampleIndex value that is stored in the file, which is an integer, and not a value calculated using Ragnar's formula.

library(oce) # github.com/dankelley/oce "develop" branch
file <- "~/Dropbox/oce_secret_data/ad2cp/s100_truncated.ad2cp"
layout(rbind(c(1:2),c(3:4)), width=c(2/3,1/3))
if (file.exists(file)) {
    d <- read.oce(file)
    time <- d@data$echosounder$time
    distance <- d@data$echosounder$distance
    e <- log10(d@data$echosounder$echosounder)
    imagep(time, distance, e, quiet=TRUE, ylab=resizableLabel("distance"))
    ylim <- par("usr")[3:4]
    em <- apply(e, 2, mean)
    wem <- which.max(em)
    par(mgp=c(2,0.7,0))
    plot(em, distance,
        xlab="intensity", ylab=resizableLabel("distance"),
        type="l", ylim=ylim, yaxs="i")
    mtext(sprintf("e peak @ %.0fm", distance[wem]))

    rtime <- d@data$echosounderRaw$time
    rdistance <- d@data$echosounderRaw$distance
    re <- log10(Mod(d@data$echosounderRaw$samples))
    imagep(rtime, rdistance, re, quiet=TRUE, ylab=resizableLabel("distance"))
    ylim <- par("usr")[3:4]
    rem <- apply(re, 2, mean)
    # the max is in first samples (during XMIT burst, I think)
    wrem <- 20 + which.max(tail(rem, -20))
    par(mgp=c(2,0.7,0))
    plot(rem, rdistance,
        xlab="intensity", ylab=resizableLabel("distance"),
        type="l", ylim=ylim, yaxs="i")
    mtext(sprintf("eraw peak @ %.0fm", rdistance[wrem]))
}

Two questions remain:

Should we use the integer startSampleIndex that is in the file, or should we compute it using the formula provided by Ragnar? The former is an integer value that is 16 in a sample file, and if that's typical then rounding might be expected to give about 3% error in the results for cellSize and thus distance. The graph above uses the integer value. If the calculated startSampleIndex were used instead, the peak would shift from 282m to r round(with(d@data$echosounderRaw,cellSize2*282/cellSize)) m.
What soundSpeed should be used in computations of cellSize? In the graph presented above, a mean of soundSpeed values from the file was used.