MkUsgsTimeSlice: Make timeslices from USGS discharge data files gathered using...

Description Usage Arguments Value Examples

View source: R/usgs_time_slice.R

Description

Make timeslices from USGS discharge data files gathered using dataRetrieval..

Usage

1
2
MkUsgsTimeSlice(realTimeFiles, outPath, nearestMin = 5,
  oldestTime = NULL, qcFunction, varianceFunction)

Arguments

realTimeFiles

Character vector of active RData format files to be processed.

outPath

Character, the directory path where ncdf files are to be written.

nearestMin

Numeric, the time resolution to which the observation times are rounded and the netcdf timeslice files are to be written. Must evenly divide 60.

oldestTime

POSIXct, the date BEFORE which data will be ignored.

qcFunction

Function, used to apply quality control procedures to the timeslice in metric units. Note that this QC can only use information from the current time. QC procedures involving the temporal domain will be applied elsewhere.

varianceFunction

Function, used to derive the observation variance from a dataframe with the following columns: site_no, dateTime, code, queryTime, and discharge.cms. The function accepts the dataframe and returns the data frame with the new variance column.

Value

A dataframe with two columns: POSIXct and filename which given the time of the timeslice and the corresponding file name with full path.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
## Not run: 
realTimeFiles <- list.files(pattern='huc.*.RData', 
                            path='~/usgsStreamData/realTimeData', 
                            full.names=TRUE)
outPath = '~/usgsStreamData/timeSliceData/'
library(doMC)
registerDoMC(4)

## A first test
ret1 <- MkUsgsTimeSlice( realTimeFiles[1:21], outPath=outPath, 
                         oldest=as.POSIXct('2015-04-15 00:00:00', tz='UTC') )
nrow(ret1)

## delete the files and see how many more are created without the oldestTime set
unlink(ret1$file)
ret1 <- MkUsgsTimeSlice( realTimeFiles[1:21], outPath=outPath )
nrow(ret1)  ## quite a few more files. 
ncdump(ret1$file[230])  ## 27 stations
ret2 <- MkUsgsTimeSlice( realTimeFiles[22:42], outPath=outPath )
ncdump(ret1$file[230])  ## 58 stations

## new experiment
unlink(unique(c(ret1$file, ret2$file)))
ret1 <- MkUsgsTimeSlice( realTimeFiles, outPath=outPath, nearest=60,
                        oldest=as.POSIXct('2015-04-15 00:00:00', tz='UTC'))
nStn <- 
 plyr::ldply(NamedList(ret1$file), 
      function(ff) { nc <- ncdump(ff, quiet=TRUE)
                     data.frame(nStn=nc$dim$stationId$len,
                                time=as.POSIXct('1970-01-01 00:00:00',tz='UTC') + 
                                     nc$dim$time$vals,
                                nUniqueStn = length(unique(nc$dim$stationId$vals)) )},
             .parallel=TRUE)
library(ggplot2)
ggplot(nStn, aes(x=time,y=nStn)) + geom_point(color='red')


###############################
## process on hydro-c1
realTimeFiles <- list.files(pattern='huc.*.RData', 
                            path='~/usgsStreamData/realTimeData', 
                            full.names=TRUE)
outPath = '~/usgsStreamData/timeSliceData/'
library(doMC)
registerDoMC(12)

## I'm worried about using too much memory, when I run this on all 
## previously collected data, so break up the problem
chunkSize <- 1000
chunkDf <- data.frame( ind = 0:(length(realTimeFiles) %/% chunkSize) )
chunkDf <- within(chunkDf, { start = (ind)*chunkSize+1
                             end   = pmin( (ind+1)*chunkSize, length(realTimeFiles)) } )

for (ii in 1:nrow(chunkDf) ) {
ret1 <- MkUsgsTimeSlice( realTimeFiles[chunkDf$start[ii]:chunkDf$end[ii]], 
                         outPath=outPath, nearest=60,
                         oldest=as.POSIXct('2015-04-15 00:00:00', tz='UTC')
                       )
}

## end dontrun 
## End(Not run)  

NCAR/rwrfhydro documentation built on Feb. 28, 2021, 12:47 p.m.