# wave_to_nvspl
#' @name wave_to_nvspl
#' @title Calibrate and convert wave files into NVSPL formatted table
#' @description This function uses PAMGuide code to convert wave files into an NVSPL formatted table. These are hourly files comprised of 1/3 octave data in 1-sec LEQ increments. PAMGuide was developed by \href{https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12330}{Merchant et al. 2015}. The suggested workflow for this function is to first set test.file = TRUE to test that your workflow has been accurately parameterized. Next, to batch process NVSPLs, run with test.file = FALSE.
#' @param input.directory Top-level input directory path to audio files to be processed. e.g. E:/AUDIO. Audio files are expected to have the naming convention SITEID_YYYYMMDD_HHMMSS.wav. For nonstandard file patterns, use the filext and filpat arguments (NOT TESTED).
#' @param results.directory OPTIONAL character path stating where NVSPL .txt files should be placed. If omitted, NVSPL directory will be created automatically within input directory.
#' @param data.directory Logical flag to specify whether audio files are housed in 'Data' subdirectories
#' @param test.file Logical flag for whether to test a file. If TRUE, tests a single file and produces plots and diagnostic outputs. If FALSE, processes entire audio dataset indicated by input.directory.
#' @param project File name for your project (e.g., 'GLBAPhenology2019').
#' @param instrum Audio recorder used. Default = 'SM4'.
#' @param filext File extension pattern. Default = '_%Y%m%d_%H%M%S.wav'. If using split files, '_0_%Y%m%d_%H%M%S_000.wav'.
#' @param filpat File pattern. Default = '.+\\d{8}_\\d{6}.wav'.
#' @param mhset Microphone sensitivity (dBV/Pa). Default = -35.
#' @param Gset Gain setting. Default = 16.
#' @param vADCset Zero-peak. Default = 1.
#' @param enviset Use 'Air' or 'Wat' to indicate whether audio recordings occurred in a terrestrial or underwater environment, respectively. Default = 'Air'.
#' @param rescWat Use 1 if you want to re-scale underwater values to be able to plot using AMT, 0 if not. Default = 0.
#' @param timezone Specify timezone setting used in the audio recorder (e.g, 'GMT'). If recordings were taken in local time at your study site, specify an \href{https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List}{Olson-names-formatted character timezone} for the location (e.g., 'America/Los_Angeles'). This is extremely important to foster clarity in data analysis through the years, as some projects have varied year to year in whether recordings were taken in GMT vs. local time.
#' @return If test.file = TRUE, returns diagnostics. If test.file = FALSE, returns NVSPL txt files in NVSPL folder generated by the function.
#'
#' Output NVSPL txt file contains the following columns. Note that many of these columns (e.g. Voltage, WindSpeed, INVID) will not be filled in if processing wave files; however, columns are retained here for compatibility with other NPS workflows.
#'
#' \itemize{
#' \item{\strong{SiteID}: Site name.}
#' \item{\strong{STime}: Date time.}
#' \item{\strong{H12p5} through \strong{H20000}: ISO standard 1/3 octave band with energy centered around the specified frequency. See \href{https://law.resource.org/pub/us/cfr/ibr/002/ansi.s1.11.2004.pdf}{ANSI S1.11}}.
#' \item{\strong{dbA}: A-weighted noise measurement. }
#' \item{\strong{dbC}: C-weighted noise measurement. Not applied in this function. }
#' \item{\strong{dbZ}: Z-weighted noise measurement. }
#' \item{\strong{Voltage}: Voltage.}
#' \item{\strong{WindSpeed}: WindSpeed.}
#' \item{\strong{WindDir}: WindDir.}
#' \item{\strong{TempIns}: Internal temperature of the measurement instrument.}
#' \item{\strong{TempOut}: External temperature.}
#' \item{\strong{Humidity}: Humidity.}
#' \item{\strong{INVID}: Investigator ID.}
#' \item{\strong{INSID}: Instrument ID.}
#' \item{\strong{GChar1}: General character placeholder.}
#' \item{\strong{PAMGuideVersion}: Version of NSNSD PAMGuide code used.}
#' \item{\strong{timezone}: Indicates the timezone reflected in the audio filename. 'GMT' indicates that audio recordings were taken with no timezone adjustment. If recordings were taken in local time at a study site, please specify an \href{https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List}{Olson-names-formatted character timezone} for the location (e.g., 'America/Los_Angeles'). This is extremely important to foster clarity in data analysis through the years, as some projects vary across time and space in terms of whether recordings are taken in GMT vs. local time.}
#' \item{\strong{AdjustmentsApplied}: Used for gain or windscreen corrections.}
#' \item{\strong{CalibrationAdjustment}: Calibration ID (often not used).}
#' \item{\strong{GPSTimeAdjustment}: GPS-based time adjustment.}
#' }
#'
#'
#' @details
#'
#' This function was developed by the National Park Service Natural Sounds and Night Skies Division to act as a wrapper to PAMGuide that would support NSNSD data processing workflows.
#'
#' For more information on function arguments such as mhset, Gset, and vADCset, NSNSD staff should see \href{https://doimspp.sharepoint.com/sites/nsnsdallstaff/Shared%20Documents/Science%20and%20Tech/Software/SongMeterToNVSPL/SongMeter4toNVSPL.mp4}{this tutorial}.
#'
#' PAMGuide was published as supplementary material to the following Open Access journal article:
#'
#' \href{https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.12330}{Merchant, N.D., Fristrup, K.M., Johnson, M.P., Tyack, P.L., Witt, M.J., Blondel, P., Parks, S.E. (2015). Measuring acoustic habitats. Methods in Ecology and Evolution.}
#'
#' \itemize{
#' \item{\href{https://besjournals.onlinelibrary.wiley.com/action/downloadSupplement?doi=10.1111%2F2041-210X.12330&file=mee312330-sup-0001-AppendixS1.pdf}{View accompanying PAMGuide tutorial}}
#' \item{\href{https://sourceforge.net/projects/pamguide/}{Download original PAMGuide code}}
#' }
#'
#' @seealso \code{\link{nvspl_to_ai}}
#' @import tuneR
#' @importFrom utils read.csv write.table
#' @importFrom graphics hist
#' @export
#' @include Meta.R PAMGuide.R PAMGuide_Meta.R
#' @examples
#' \dontrun{
#'
#' # Create an input directory for this example
#' dir.create('example-input-directory')
#'
#' # Read in example wave files
#' data(exampleAudio1)
#' data(exampleAudio2)
#'
#' # Write example waves to example input directory
#' tuneR::writeWave(
#' object = exampleAudio1,
#' filename = 'example-input-directory/Rivendell_20210623_113602.wav'
#' )
#'
#' tuneR::writeWave(
#' object = exampleAudio2,
#' filename = 'example-input-directory/Rivendell_20210623_114602.wav'
#' )
#'
#' # Perform wave_to_nvspl in test mode (test.file = TRUE)
#' wave_to_nvspl(
#' input.directory = 'example-input-directory',
#' data.directory = FALSE,
#' test.file = TRUE,
#' project = 'testproject',
#' timezone = 'GMT'
#' )
#'
#' # Perform wave_to_nvspl in batch mode (test.file = FALSE)
#' wave_to_nvspl(
#' input.directory = 'example-input-directory',
#' data.directory = FALSE,
#' test.file = FALSE,
#' project = 'testproject',
#' timezone = 'GMT'
#' )
#'
#' # Verify that NVSPL outputs have been created
#' nvspls <- list.files('example-input-directory/NVSPL', full.names = TRUE)
#'
#' # View one of the NVSPL outputs
#' one.nvspl <- read.delim(file = nvspls[1], sep = ',')
#'
#' # Delete all temporary example files when finished
#' unlink(x = 'example-input-directory', recursive = TRUE)
#'
#' }
#'
# More ideas:
# Instead of test.file, could do a "diagnostics" argument instead, to plot a time continuous, whether time stamp is right etc. (Good idea but a lot more effort
# since it's essentially eliminating the pamguide function and then would have to
# knit those two functions together to get the same effect. Not a priority for
# CB yet)
# A way to get rid of the pamguide function?
# Better variable names and/or commenting. Highly desirable but CB not knowledgeable
# enough to do this expediently and know what variable names / math should be, etc. Not a top priority.
# Params file (like a run log)- use future params file as an input? Good idea
# need to overhaul NVSPL file in general?
wave_to_nvspl <- function(input.directory,
results.directory,
data.directory = TRUE,
test.file = FALSE,
project,
instrum = "SM4",
filext = "_%Y%m%d_%H%M%S.wav",
filpat = ".+\\d{8}_\\d{6}.wav",
mhset = -35,
Gset = 16,
vADCset = 1,
enviset = "Air",
rescWat = 0,
timezone
)
{
if(missing(timezone)) {
stop('You are missing an input to the "timezone" argument. Please specify the timezone setting used in the audio recorder (e.g., "GMT", "America/Denver"). This argument allows you to declare the timezone shown by the wave filename. If recordings were taken in local time at your study site, specify an Olson-names-formatted character timezone (https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List) for the location (e.g., "America/Los_Angeles"). If recordings were taken in GMT, you can put either "GMT" or "UTC" (both are acceptable in R for downstream date-time formatting). This argument is critical to foster clarity in data analysis through the years and across monitoring locations, because some projects may vary across time and space as to whether the standard operating procedure specifies recordings in GMT vs. local time.')
}
# Ensure a forward slash at end ($) of input.directory
if (grepl("\\/$", input.directory) == FALSE) {
input.directory <- paste0(input.directory, '/')
}
## INITIALIZE PARAMETERS
## (1) SET DIRECTORY WHERE ALL AUDIO FILES TO PROCESS ARE
#if you want to process multiple directories (sites) with same recording parameters, choose the highest directory
WAVDirsDIR <- input.directory
## (2) LIST OF DIRECTORIES TO PROCESS
WAVDirs <- list.dirs(WAVDirsDIR)
# If using highest directory, that one may not have wavfiles in it, so only find the ones with wav files
check.folders <- sapply(WAVDirs, function(x) length(list.files(path = x, pattern = '.wav')))
WAVDirs <- WAVDirs[check.folders > 0]
# ARE YOUR FILES IN "Data" subdirectories? (standard format from song meters)
# If data in subdirs:
if (data.directory == TRUE) {WAVDirs = WAVDirs[grep("Data", WAVDirs)]}
## (3) VERSION OF PAMGUIDE, see the most recent code
vers <- 'V12noChunk_fixdBA' # CB: to be updated with any future tweaks/releases of this Github code
# this way it updates automatically in the function instead of user having to input
if(enviset == 'Air') envir <- 2
if(enviset == 'Wat') envir <- 1
# Save params for posterity once all necessary objects have been initialized
params.name <- paste0("paramsFileNVSPL_", project, "_", instrum)
save(file = paste0(input.directory, params.name), list = ls(all.name = TRUE))
message('\nOutput "params" file memorializing your inputs to this function has been saved in: \n ', input.directory, ' as ', params.name, '\n')
# IF TESTING A SINGLE FILE - PROCESS ONE FILE AND CHECK OUTPUT
if (test.file == TRUE) {
# get a test file
WAVFiles = list.files(WAVDirs[1], pattern = filpat, full.names = TRUE)
# find unique days
dys = unique(gsub(".+_(\\d{8})_(.+).wav","\\1",WAVFiles) )
# sets the filenames
s1 = unlist (strsplit( basename(WAVFiles[1]), '_') ) [1]
site = unlist (strsplit( s1, '_') )[1]
filename2 = paste(site, filext, sep="")
# run the calibration
message('\n Check all outputs. Make sure there isn\'t an NA in "Time stamp start time".\n')
PAMGuide(chunksize = 500, atype = 'TOL',
timestring = filename2,
r=0, outwrite=1, plottype = "None",
calib=1, envi=enviset, ctype="TS",
Mh=mhset, G=Gset,
vADC=vADCset,
WAVFiles = WAVFiles)
message('Reminder: Check all outputs. Make sure there isn\'t an NA in "Time stamp start time".\n')
# In the outputs here, make sure there isn't an NA in your start time
# Evaluate the file created
testFile <- list.files(WAVDirs[1], pattern = 'TOL', recursive=T, full.names=T)
basename(testFile)
# combine data
conk <- as.matrix( read.csv(testFile[1], colClasses="numeric", header=FALSE) )
dimc <- dim(conk)
# what is the dB range?
a <- conk[2:dimc[1],2:dimc[2]]
hist(a,main = "Check to see if calibration is accurate", xlab="SPL dB")
message('dB range: ', round(min(a),2), ' to ', round(max(a), 2), '. If these numbers are negative, go back and check your gain settings, environment, and other inputs.' )
# [ want to see numbers like 0 to 80, if you're getting negative numbers go
# back and check your gain settings, environment, inputs etc.]
# 10 to 89 is totally reasonable for this envt
# time stamp- start and continuous
t <- conk[2:dimc[1],1]
t <- as.POSIXct(t,origin="1970-01-01")
if ( is.na(as.character(t[1])) ) {
stop('Time is not correct format. Check your inputs to this function.')
}else {
plot(conk[2:dimc[1],1],main = 'Check this plot. Is time continuous?') }
message('Check the plot to ensure time is continuous (no breaks).')
# delete files, if okay, re-run if not
file.remove(testFile)
rm(conk,a,dimc,t, testFile, site, filename2)
} # end test.file
# IF NOT TESTING A FILE, BATCH PROCESS ALL AUDIO FILES IN DIRECTORIES SELECTION
if (test.file == FALSE) {
# can process multiple sites as long as they have the same calibration parameters
for (ff in 1:length(WAVDirs)) # ff = 1
{
## NEED to LOOP through wav files of the same day to make NVSPLs
## find unique days
WAVfiles = list.files(WAVDirs[ff], pattern = filpat )
dys = unique(gsub(".+_(\\d{8})_(.+).wav","\\1",WAVfiles) )
## NOTE: uncomment and edit next line if code breaks partway through, use this to start loop on next file
#dys = dys[74:90] # CB: THIS IS NOT ROBUST. Add a trycatch & warning messaging instead.
## sets the file names for the directory- new one for each direcory
s1 = unlist(strsplit( WAVfiles[1], '_') ) [1]
site = unlist(strsplit( s1, '_') )[1]
filename = paste(site, filext, sep="")
## create NVSPL OUTPUT directory
if(!missing(results.directory)) {
# if (grepl("\\/$", results.directory) == FALSE) {
# results.directory <- paste0(results.directory, '/')
NVSPLdir <- results.directory
} else {
NVSPLdir = paste(WAVDirs[ff], "NVSPL",sep="\\")
}
dir.create(NVSPLdir, showWarnings = FALSE, recursive = TRUE)
message('\nNVSPL directory is located in: ', NVSPLdir, '\n')
## LOOP through the unique days- calibrate then convert to NVSPL format
cnt = 0
for(d in (dys) ) # d = dys[1] # for testing
{
cnt = cnt + 1
cat('##################################################')
cat('Processing DIRECTORY ', ff, " of ", length(WAVDirs), " for DAY", cnt, ' of ', length(dys), '\n' )
cat('##################################################')
## (1) GET a list files for each day-------------------------------------------
udaylist = grep(d, WAVfiles, value=T)
filenms = paste(WAVDirs[ff], "\\", udaylist, sep="")
## (2) RUN PAMGUIDE-------------------------------------------------------------
# Preempt breakage on corrupt files (typically, files that are too short)
catch.error <- tryCatch(
Meta(atype = 'TOL',
timestring = filename,
r = 0,
outwrite = 1,
plottype = "None",
calib =1,
envi = enviset,
ctype = "TS",
Mh = mhset,
G = Gset,
vADC = vADCset,
filenms = filenms),
error = function(e) e
)
if (inherits(catch.error, 'error')) {
message('\nThere is a problem with an audio file; skipping to next\n')
next
} # end trycatch
## (3) READ IN file created by PAMGUIDE------------------------------------------
PAMfiles = list.files(WAVDirs[ff], pattern = "Conk.*.csv",
recursive=T, full.names=T)
PAMfiles2 = list.files(WAVDirs[ff], pattern = ".csv", # CB: originally this said "*.csv". Had to change to '.csv' to work
recursive=T, full.names=T)
PAMdirFiles = dirname(PAMfiles2[2])
# read in 1st PAM file
conk <- as.matrix( read.csv(PAMfiles[1], colClasses="numeric",header=FALSE) )
# conk <- read.csv(PAMfiles[1], header = FALSE)
# remove the folder with files for each WAV file
unlink(PAMfiles2)
if (length(list.files(PAMdirFiles, pattern = 'wav|WAV') > 0)) {
# Look in this folder. If there are any wave files, we shouldn't be deleting.
# Need to find a more robust way to rewrite this code so that it doesn't
# accidentally find a top-level audio directory and delete it again...
stop('Files for this day are too short or there is another problem.')
} else {
unlink(PAMdirFiles, recursive = TRUE) # this is the problem line.
}
## (4) EXTRACT PARAMS--------------------------------------------------------------
aid <- conk[1,1]
tstampid <- substr(aid,1,1) #extract time stamp identifier
enviid <- substr(aid,2,2) #extract in-air/underwater identifier
calibid <- substr(aid,3,3) #extract calibrated/uncalibrated identifier
atypeid <- substr(aid,4,4)
# assign PAMGuide variables envi, calib, atype from metadata
#CB: IN THEORY THE ATYPE is never going to change since atype = 'TOL'
# is a default arg in the meta and pamguide_meta fxn
if (tstampid == 1){tstamp = 1} else {tstamp = ""}
if (enviid == 1){
envi = 'Air' ; pref <- 20
} else {envi = 'Wat' ; pref <- 1}
if (calibid == 1){calib = 1
} else {calib = 0}
if (atypeid == 1){atype = 'PSD'
} else if (atypeid == 2) {atype = 'PowerSpec'
} else if (atypeid == 3) {atype = 'TOLf'
} else if (atypeid == 4) {atype = 'Broadband'
} else if (atypeid == 5) {atype = 'Waveform'}
# extract DATA SPL DATA and TIMESTAMP.....
dimc <- dim(conk)
t <- conk[2:dimc[1],1]
t <- as.POSIXct(t,origin="1970-01-01")
tString <- as.character(t) # tString[length(tString)]
a <- conk[2:dimc[1],2:dimc[2]]
f <- conk[1,2:dimc[2]]
# hist(a) max(a) min(a)
rm(conk)
## (5) FORMAT myOutput as NVSPL-----------------------------------------------------
# (note: PAMguide starts at 25 Hz, so lower bands (12.5, 15.8, and 20 are always NaNs)
NVSPLhead = c("SiteID","STime", "H12p5", "H15p8", "H20", "H25", "H31p5","H40","H50","H63","H80","H100","H125","H160","H200","H250","H315","H400","H500",
"H630","H800","H1000","H1250","H1600","H2000","H2500","H3150","H4000","H5000","H6300","H8000","H10000","H12500","H16000","H20000",
"dbA","dbC","dbZ","Voltage","WindSpeed","WindDir","TempIns","TempOut","Humidity",
"INVID","INSID","GChar1","PAMGuideVersion","timezone", "AdjustmentsApplied","CalibrationAdjustment","GPSTimeAdjustment","GainAdjustment","Status")
# CB == changed GChar3 to timezone for clarity
# check to see of more 1/3 OCB than 33, if so truncate data
if(dim(a)[2] > 30) a <- a[,1:30]
# check to see if less than 33 octave
endA = ((33-4)-dim(a)[2])+1 # cb: what is the actual check?
# CB: add status message that if less than 33 octaves, we do something
# calculate a dBA
# CB: ideally data.frame this with named cols
aweight <- c(-63.4,-56.7,-50.5,-44.7, -39.4, -34.6, -30.2, -26.2, -22.5, - 19.1, -16.1,
-13.4, -10.9, -8.6, -6.6, -4.8, -3.2, -1.9, -0.8, 0, 0.6, 1, 1.2,
1.3, 1.2, 1.0, 0.5, -0.1, -1.1, -2.5, -4.3, -6.6, -9.3)
# only use a-weights for the available data
#aA <- a + aweight[4:(33-endA)]
#a[1,] + aweight[4:(33-endA)]
aA = t( t(a) + aweight[4:(33-endA)] )
# convert to pressure
press <- rowMeans(10^(aA/10)) # cb: ie pressure mean
dBA = 10*log10(press) #hist(dBA) # cb: mean of the pressures
pressS <- rowSums(10^(aA/10)) # cb: ie pressure sum
zweight <- rowSums(10^(a/10)) # cb: if want to do a zweighting fxn
dBAsum = 10*log10(pressS) #hist(dBA) #cb: sum of the pressures
# if underwater, rescales the values to the AMT scale, using a normalization formula
if (rescWat == 1) {
if (envir == 1)
{
cat("rescaling db levels to work with AMT")
a2 = ((a - (-8)) / (87 - (-8))) * a
# hist(a2) a2 = a - 62 # accounts for offset of water/ air
a=a2
rm(a2)
}
}
## determine how many blank columns in NVSPL, assumes you add the first 5 columns
nBlankCols <- length(NVSPLhead) - (dim(a)[2] + 5)
## find unique day hours
unqHrs <- substr(tString,1,13)
## create matrix with all the data combined and add headers
# CB - I don't love this, seems too easy to make mistakes.
# More robust is to set this up as a data.frame or data.table and directly name the columns
tempOutput <- cbind(site, tString, 0, 0, 0, round(a, 1),
matrix(rep(0,dim(a)[1] * nBlankCols),
nrow = dim(a)[1], ncol = nBlankCols))
colnames(tempOutput) <- NVSPLhead
tempOutput[,'dbA'] <- dBAsum
tempOutput[,'dbZ'] <- zweight
tempOutput[,'PAMGuideVersion'] <- vers
# CB: I'll keep the GPSTimeAdjustment as the timezone since I'm not
# sure the purpose/assumptions of that column. This is how it appears to have been used in the past though.
tempOutput[,'GPSTimeAdjustment'] <- timezone
tempOutput[,'timezone'] <- timezone
# CATHLEEN note to self- -
# want to redo this so that column contents are assigned by name rather
# than by number. Bc I don't know that these are going in the right place
# just do colnames(tempout) <- nvspl head and then rename
# tempOutput[,36] = dBAsum # CB: this should be dbaSUM not just dba. But column name will be dbA
#tempOutput[,37] = dBAsum # CB: this is actually the dbC column!! not dbasum. We are not applying cweight. We can elim this colum
# tempOutput[,38] = zweight # CB: colname is dbF really it should be dbZ is the nomenclature
# tempOutput[,48] = vers
# tempOutput[,52] = timezone # CB: so apparently GPSTimeAdjustment column header is the timezone?
#
# colnames(tempOutput) <- NVSPLhead
## separate tempOutput by unique day hours
tempOutput <- cbind(unqHrs, tempOutput) #add a column to sort by
unqHrData <- split(tempOutput, tempOutput[,1]) #find where to split the data
## write out data to separate files, breaks into hours
for(hr in 1:length(unqHrData))
{
dataToWrite <- matrix(unqHrData[[hr]],ncol=dim(tempOutput)[2])[,-1]
colnames(dataToWrite) <- NVSPLhead
outFileName <- paste(NVSPLdir,"\\", "NVSPL_", site, "_",
gsub(" ","_",gsub("-","_",names(unqHrData[hr]))),
".txt", sep="")
if ( file.exists(outFileName) == T) {write.table(dataToWrite, file=outFileName, append = TRUE, na="", quote=F, row.names=F, col.names = F, sep = ",") }
if (!file.exists(outFileName) == T) {write.table(dataToWrite, file=outFileName, na="", quote=F, row.names=F, sep = ",") }
}
} ## END DAY LOOP (d)
} ## END DIRECTORY LOOP (ff)
} # end if test.file == FALSE
} # end Wave_To_NVSPL function
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.