R/handle_cimis.R

Defines functions handle_cimis

Documented in handle_cimis

#' List, download or convert to chillR format data from the CIMIS database
#' 
#' This function can do three things related to the California Irrigation
#' Management Information System ("CIMIS") database: 1. it can list stations
#' that are close to a specified position (geographic coordinates) 2. it can
#' retrieve weather data for a named weather station 3. it can 'clean'
#' downloaded data, so that they can easily be used in chillR Which of these
#' functions is carried out depends on the action argument.
#' 
#' This function can run independently, but it is also called by the
#' get_weather and weather2chillR functions, which some users might find a bit
#' easier to handle.
#' 
#' The CIMIS dataset is described here: http://www.cimis.water.ca.gov/
#' 
#' Under the 'list_stations' mode, several formats are possible for specifying
#' the location vector, which can consist of either two or three coordinates
#' (it can include elevation). Possible formats include c(1,2,3), c(1,2),
#' c(x=1,y=2,z=3), c(lat=2,long=1,elev=3). If elements of the vector are not
#' names, they are interpreted as c(Longitude, Latitude, Elevation).
#' 
#' The 'chillRCode' is generated by this function, when it is run with
#' geographic coordinates as location inputs. In the list of nearby stations
#' that is returned then, the chillRCode is provided and can then be used as
#' input for running the function in 'downloading' mode. For downloading the
#' data, use the same call as before but replace the location argument with the
#' chillRCode.
#' 
#' @param action if this is the character string "list_stations", the function
#' will return a list of the weather stations from the database that are
#' closest to the geographic coordinates specified by location.  if this is the
#' character string "download_weather", the function will attempt to download
#' weather data from the database for the station named by the location
#' argument, which should then be a character string corresponding to the
#' chillRcode of the station (which you can get by running this function in
#' 'list_stations mode) if this is a downloaded weather file (downloaded by
#' running this function in 'download weather' mode), the function cleans the
#' file and makes it ready for use in chillR. If the input is just a dataframe
#' (not a list, as produced with this function), you have to specify the
#' database name with the database argument
#' @param location either a vector of geographic coordinates (for the
#' 'list_stations' mode), or the 'chillRcode' of a weather station in the
#' specified database (for the 'download_weather' mode. When running this
#' function for data cleaning only, this is not needed.
#' @param time_interval numeric vector with two elements, specifying the start
#' and end date of the period of interest. Only required when running in
#' 'list_stations' or 'download weather' mode
#' @param station_list if the list of weather stations has already been
#' downloaded, the list can be passed to the function through this argument.
#' This can save a bit of time, since it can take a bit of time to download the
#' list, which can have several MB.
#' @param stations_to_choose_from if the location is specified by geographic
#' coordinates, this argument determines the number of nearby stations in the
#' list that is returned.
#' @param drop_most boolean variable indicating if most columns should be
#' dropped from the file. If set to TRUE (default), only essential columns for
#' running chillR functions are retained.
#' @param end_at_present boolean variable indicating whether the interval of
#' interest should end on the present day, rather than extending until the end
#' of the year specified under time_interval[2] (if time_interval[2] is the
#' current year).
#' @return The output depends on the action argument. If it is 'list_stations',
#' the function returns a list of station_to_choose_from weather stations that
#' are close to the specified location. This list also contains information
#' about how far away these stations are (in km), how much the elevation
#' difference is (if elevation is specified; in m) and how much overlap there
#' is between the data contained in the database and the time period specified
#' by time_interval. If action is 'download_weather' the output is a list of
#' two elements: 1. database="CIMIS" 2. the downloaded weather record, extended
#' to the full duration of the specified time interval. If action is a weather
#' data.frame or a weather record downloaded with this function (in
#' 'download_weather' mode), the output is the same data in a format that is
#' easy to use in chillR. If drop_most was set to TRUE, most columns are
#' dropped.
#' @note Many databases have data quality flags, which may sometimes indicate
#' that data aren't reliable. These are not considered by this function!
#' 
#' Past CIMIS data is provided to the public as compressed data files of annual
#' data, which contain data for all stations for the respective years. The same
#' strategy was followed for monthly data of the past year. This means that in
#' order to get to the records for one given station, it is necessary to
#' download data for all stations first, before extracting weather for the
#' station of interest. This means that downloads take a lot longer than one
#' might expect, and the downloaded data volume is a multiple of what is really
#' of interest.
#' 
#' @importFrom readxl read_excel
#' @importFrom RCurl getURL
#' @author Eike Luedeling
#' @references The chillR package:
#' 
#' Luedeling E, Kunz A and Blanke M, 2013. Identification of chilling and heat
#' requirements of cherry trees - a statistical approach. International Journal
#' of Biometeorology 57,679-689.
#' @keywords utilities
#' @examples
#' 
#' # the example is #d out, since the download request sometimes times out, and that
#' # causes problems with CRAN approval of the package
#' 
#' # handle_cimis(action = "list_stations",
#' #              location = c(x = -122, y = 38.5),
#' #              time_interval = c(2012, 2012))
#' 
#' # stat_list <- data.frame("Station Number" = c("119", "139", "6"),
#' #                         Latitude = c(38.49500, 38.50126, 38.53569),
#' #                         Longitude = c(-122.0040, -121.9785, -121.7764),
#' #                         Start_date =c("1993-08-21 UTC", "1998-06-15 UTC", "1982-07-17 UTC"),
#' #                         End_date = c("1995-01-25", "2016-03-06", "2016-03-06"))
#' 
#' # gw <- handle_cimis(action = "download_weather",
#' #                    location = "6",
#' #                    time_interval = c(1982, 1982), 
#' #                    station_list = stat_list)
#'  
#' # weather <- handle_cimis(gw)
#' 
#' # make_chill_plot(tempResponse(stack_hourly_temps(fix_weather(weather)),
#' #                 Start_JDay = 300, End_JDay = 50), 
#' #                 "Chill_Portions", start_year = 2010, end_year = 2012,
#' #                 metriclabel = "Chill Portions", misstolerance = 50)
#' 
#' @export handle_cimis
handle_cimis<-function(action,location=NA,time_interval=NA,station_list=NULL,stations_to_choose_from=25,drop_most=TRUE,
                       end_at_present=TRUE)
{
 #station list section
  remove_missing<-function(tab,column,missing)
  {tab[which(tab[,column]==missing),column]<-NA
  return(tab)}

  get_station_list<-function(long=NA,lat=NA,time_interval=NA,
                             stations_to_choose_from=25)
    {suppressWarnings(dir.create("chillRtempdirectory"))
    dir<-as.character(unlist(sapply(unlist(strsplit(getURL("ftp://ftpcimis.water.ca.gov/pub2/",
                                                           ftp.use.epsv = FALSE,
                                                           dirlistonly = TRUE),"\n")),function(x)strsplit(x,"\r"))))
    summary_file<-dir[grep("CIMIS Stations List",dir)]
    download.file(paste("ftp://ftpcimis.water.ca.gov/pub2/",summary_file,sep=""),
                  "chillRtempdirectory/a.xlsx", mode="wb", method="libcurl")
    cimis<-suppressWarnings(read_excel("chillRtempdirectory/a.xlsx"))
    file.remove("chillRtempdirectory/a.xlsx")
    unlink("chillRtempdirectory",recursive=TRUE)
    
    cimis<-as.data.frame(cimis[which(!is.na(cimis[,1])),])
    cimis[,"end"]<-NA
    cimis[which((!cimis$Disconnect=="Active")),"end"]<-as.numeric(cimis[which((!cimis$Disconnect=="Active")),"Disconnect"])
    
    cimis$end<-as.Date(cimis$end,origin="1899/12/30")
    cimis$end[which(is.na(cimis$end))]<-Sys.time()
    cimis<-cimis[,c("Station Number","Name", "County", "Latitude",
                    "Longitude", "ELEV","Connect","end")]
    colnames(cimis)<-c("Stat_num","Name","County","Latitude","Longitude","Elevation","Start_date","End_date")
    cimis$Elevation<-cimis$Elevation*0.3048
    
    if(is.na(long)) return(cimis)

    lat_rad <- lat*pi/180
    lon_rad <- long*pi/180
    lat_rad_stat <- cimis$Lat*pi/180
    lon_rad_stat <- cimis$Long*pi/180
    
    cimis[,"distance"]<-
      round(6378.388 * acos(sin(lat_rad) * sin(lat_rad_stat) +
                              cos(lat_rad) * cos(lat_rad_stat) *
                              cos(lon_rad_stat - lon_rad)), 
            2)
    sorted_list<-cimis[order(cimis$distance),]
    if(!is.na(elev)) sorted_list[,"elevation_diff"]<-elev-sorted_list$Elevation
    sorted_list[,"chillR_code"]<-as.character(sorted_list$Stat_num)
    if(!is.na(time_interval[1]))
    {interval_end<-YEARMODA2Date(time_interval[2]*10000+1231)
    if(end_at_present) interval_end<-min(interval_end,ISOdate(format(Sys.Date(),"%Y"),format(Sys.Date(),"%m"),
                                                              format(Sys.Date(),"%d")))
    interval_start<-YEARMODA2Date(time_interval[1]*10000+0101)
    sorted_list<-sorted_list[1:min(nrow(sorted_list),max(stations_to_choose_from,500)),]
    overlap_days<-apply(sorted_list,1,function (x) (as.numeric(difftime(
      sort(c(x["End_date"],format(interval_end,"%Y-%m-%d")))[1],
      sort(c(x["Start_date"],format(interval_start,"%Y-%m-%d")))[2])+1)))
    sorted_list[,"Overlap_years"]<-round(
      apply(sorted_list,1,function (x) (as.numeric(difftime(
        sort(c(x["End_date"],format(interval_end,"%Y-%m-%d")))[1],
        sort(c(x["Start_date"],format(interval_start,"%Y-%m-%d")))[2])+1)/(365+length(which(sapply(time_interval[1]:time_interval[2],leap_year)))/(time_interval[2]-time_interval[1]+1)))),2)
    sorted_list[which(sorted_list[,"Overlap_years"]<0),"Overlap_years"]<-0
    sorted_list[,"Perc_interval_covered"]<-round(overlap_days/as.numeric(interval_end-interval_start+1)*100,2)
    if(!is.na(elev))  sorted_list<-sorted_list[,c("chillR_code","Name","County","Latitude","Longitude","Elevation","Start_date","End_date",
                                                  "distance","elevation_diff","Overlap_years","Perc_interval_covered")] else
                                                    sorted_list<-sorted_list[,c("chillR_code","Name","County","Latitude","Longitude","Start_date","End_date",
                                                                                "distance","Overlap_years","Perc_interval_covered")]} else
                                                                                  if(!is.na(elev))  sorted_list<-sorted_list[,c("chillR_code","Name","County","Latitude","Longitude","Elevation","Start_date","End_date",
                                                                                                                                "distance","elevation_diff")] else
                                                                                                                                  sorted_list<-sorted_list[,c("chillR_code","Name","County","Latitude","Longitude","Start_date","End_date",
                                                                                                                                                              "distance")]
    return(sorted_list[1:stations_to_choose_from,])}
  

  if(is.character(action))  if(action=="list_stations")
        {
        if(!is.null(names(location)))
        {lat<-unlist(sapply(names(location),function(x) max(c(length(grep("lat", x, ignore.case = TRUE)),length(grep("y", x, ignore.case = TRUE))))))
        if(sum(lat)==1) lat<-as.numeric(location[which(lat==1)])
        long<-unlist(sapply(names(location),function(x) max(c(length(grep("lon", x, ignore.case = TRUE)),length(grep("x", x, ignore.case = TRUE))))))
        if(sum(long)==1) long<-as.numeric(location[which(long==1)])
        elev<-unlist(sapply(names(location),function(x) max(c(length(grep("ele", x, ignore.case = TRUE)),length(grep("alt", x, ignore.case = TRUE)),
                                                              length(grep("z", x, ignore.case = TRUE))))))
        if(sum(elev)==1) elev<-as.numeric(location[which(elev==1)]) else elev<-NA
        } else {long<-location[1]
        lat<-location[2]
        if(length(location)==3) elev<-location[3] else elev<-NA}

    return(get_station_list(long,lat,time_interval,stations_to_choose_from))
    
        }


  #weather download section

if(is.character(action)) if(action=="download_weather")
  {

  #first define a download handling function to make piecing the record together easier
  download_and_process<-function(URL,y,numm,sixdays=NA)
  {suppressWarnings(dir.create("chillRtempdirectory"))
    if(!is.data.frame(sixdays))
     {
      dest<-"chillRtempdirectory/weather.zip"
      ff<-suppressWarnings(try(download.file(URL, dest, method="libcurl"),silent = TRUE))
      if(ff==0)
      {gz <- gzfile(dest, open = "rt")
      ziplist<-unzip(dest,list=TRUE)
      close(gz)
      if(numm %in% sapply(ziplist[,1],function(x) substr(x,nchar(x)-6,nchar(x)-4)))
      {if(!y==2016)
        {unzip(dest,files=paste(y,"daily",numm,".csv",sep=""),exdir="chillRtempdirectory")
         yweath<-read.csv(paste("chillRtempdirectory/",y,"daily",numm,".csv",sep=""))
         file.remove(paste("chillRtempdirectory/",y,"daily",numm,".csv",sep=""))}
        if(y==2016)
        {unzip(dest,files=paste("dlymet",numm,".csv",sep=""),exdir="chillRtempdirectory")
          yweath<-read.csv(paste("chillRtempdirectory/","dlymet",numm,".csv",sep=""))
          file.remove(paste("chillRtempdirectory/","dlymet",numm,".csv",sep=""))}
        file.remove(dest)
      } else yweath<-NA
      } else yweath<-NA
     } else yweath<-sixdays

      if(!is.numeric(y)) y<-format(Sys.time(), "%Y")
      if(is.data.frame(yweath))
      {if(y>2013) colnames(yweath)<-c("Station Id","Date","Julian Date","Reference ETo","QC for Reference ETo","Precipitation",
                                       "QC for Precipitation","Solar Radiation Average","QC for Solar Radiation Average","Average Vapor Pressure",
                                       "QC for Average Vapor Pressure","Maximum Air Temperature","QC for Maximum Air Temperature","Minimum Air Temperature",
                                       "QC for Minimum Air Temperature","Average Air Temperature","QC for Average Air Temperature","Maximum Relative Humidity",
                                       "QC for Maximum Relative Humidity","Minimum Relative Humidity","QC for Minimum Relative Humidity","Average Relative Humidity",
                                       "QC for Average Relative Humidity","Dew Point","QC for Dew Point","Average Wind Speed","QC for Average Wind Speed",
                                       "Wind Run","QC for Wind Run","Average Soil Temperature","QC for Average Soil Temperature")

       if(y<2014) colnames(yweath)<-c("Station Id","Date","Julian Date","QC for Solar Radiation Average","Solar Radiation Average","QC for Average Soil Temperature",
                                       "Average Soil Temperature","QC for Maximum Air Temperature","Maximum Air Temperature","QC for Minimum Air Temperature",
                                       "Minimum Air Temperature","QC for Average Air Temperature","Average Air Temperature","QC for Average Vapor Pressure",
                                       "Average Vapor Pressure","QC for Average Wind Speed","Average Wind Speed","QC for Precipitation","Precipitation",
                                       "QC for Maximum Relative Humidity","Maximum Relative Humidity","QC for Minimum Relative Humidity","Minimum Relative Humidity",
                                       "QC for Reference ETo","Reference ETo","QC for Average Relative Humidity","Average Relative Humidity","QC for Dew Point",
                                       "Dew Point","QC for Wind Run","Wind Run")
       yweath[,c("Reference ETo","Precipitation","Solar Radiation Average","Average Vapor Pressure",
                  "Maximum Air Temperature","Minimum Air Temperature",
                  "Average Air Temperature","Maximum Relative Humidity",
                  "Minimum Relative Humidity","Average Relative Humidity",
                  "Dew Point","Average Wind Speed",
                  "Wind Run","Average Soil Temperature")]<-suppressWarnings(apply(yweath[,c("Reference ETo","Precipitation","Solar Radiation Average","Average Vapor Pressure",
                                                                                            "Maximum Air Temperature","Minimum Air Temperature",
                                                                                            "Average Air Temperature","Maximum Relative Humidity",
                                                                                            "Minimum Relative Humidity","Average Relative Humidity",
                                                                                            "Dew Point","Average Wind Speed",
                                                                                            "Wind Run","Average Soil Temperature")],2,function(x) as.numeric(as.character(x))))
      }
      unlink("chillRtempdirectory",recursive=TRUE)
      return(yweath)
  } # end of function definition, now on to processing


   if(is.null(station_list)) station_list<-get_station_list()

   station_list<-station_list[which(!is.na(station_list[,1])),]
   #station_list[,"chillR_code"]<-as.character(station_list$"Station Number")
   station_list[,"chillR_code"]<-as.character(station_list[,grep("Stat",colnames(station_list))])
   if(location %in% station_list$chillR_code)
      { num<-as.numeric(location)
        if(num>99) numm<-as.character(num) else
          if(num>9) numm<-paste("0",num,sep="") else
            numm<-paste("00",num,sep="")

           startlisting<-TRUE
           for(y in time_interval[1]:time_interval[2]) #download annual data
             {if(!y==format(Sys.time(), "%Y"))  #the current year is only available as monthly files, see below
               {URL<-paste("ftp://ftpcimis.water.ca.gov/pub2/annualMetric/dailyStns",y,".zip",sep="")
                yweath<-download_and_process(URL,y,numm)
                 if (is.data.frame(yweath))
                   if(startlisting) {record<-yweath
                                     startlisting<-FALSE} else record<-rbind(record,yweath)}

               if(y==format(Sys.time(), "%Y")) #here the current year is processed
                 {current_month<-as.numeric(format(Sys.time(), "%m"))
                  for(m in 1:(current_month-1))
                  {mmm<-c("jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec")[m]
                    URL<-paste("ftp://ftpcimis.water.ca.gov/pub2/monthlyMetric/dailyStns",mmm,".zip",sep="")
                    mweath<-download_and_process(URL,mmm,numm)
                    if (is.data.frame(mweath))
                      if(as.numeric(strsplit(as.character(mweath[1,2]),"/")[[1]][3])==format(Sys.time(), "%Y"))
                       {if(startlisting) {record<-mweath
                        startlisting<-FALSE} else {
                          mweath<-mweath[which(!mweath[,2] %in% record[,2]),]
                          record<-rbind(record,mweath)}}}

                    #NOW ONLY THE LAST 6 DAYS

                     sixdays<-read.table(paste("ftp://ftpcimis.water.ca.gov/pub2/dailyMetric/dlymet",numm,".csv",sep=""),sep=",")
                     sixdays<-download_and_process(NA,format(Sys.time(), "%Y"),NA,sixdays)
                     if(startlisting) {record<-sixdays
                     startlisting<-FALSE} else {
                       sixdays<-sixdays[which(!sixdays[,2] %in% record[,2]),]
                       record<-rbind(record,sixdays)}}}

           if (startlisting) {record<-NA
                              warning("No weather data found for the time interval of interest.")}

      } else {record<-NA
              warning("No weather data found for this station.")}

   if(is.data.frame(record))
     {record[,"Year"]<-as.numeric(sapply(strsplit(as.character(record$Date),"/"),function(x) x[3]))
      record[,"Month"]<-as.numeric(sapply(strsplit(as.character(record$Date),"/"),function(x) x[1]))
      record[,"Day"]<-as.numeric(sapply(strsplit(as.character(record$Date),"/"),function(x) x[2]))

      first_line<-record[1,]
      if(!((as.numeric(first_line["Year"])*10000+as.numeric(first_line["Month"])*100+as.numeric(first_line["Day"]))==time_interval[1]*10000+101))
        {first_line[,]<-NA
         first_line[,c("Station Id","Year","Month","Day")]<-c(numm,time_interval[1],1,1)
         record<-rbind(first_line,record)}

      last_line<-record[nrow(record),]
      if(!((as.numeric(last_line["Year"])*10000+as.numeric(last_line["Month"])*100+as.numeric(last_line["Day"]))==time_interval[2]*10000+1231))
        {last_line[,]<-NA
        last_line[,c("Station Id","Year","Month","Day")]<-c(numm,time_interval[2],12,31)
        record<-rbind(record,last_line)}
      record$Year<-as.numeric(record$Year)
      record$Month<-as.numeric(record$Month)
      record$Day<-as.numeric(record$Day)

      record<-make_all_day_table(record,no_variable_check=TRUE)}

 return(list(database="CIMIS",weather=record))}

  #weather cleaning section

  if(is.list(action)) if(names(action)[1]=="database") # then we assume that this is a downloaded file to be cleaned
        {
        dw<-action$weather
        if(length(dw)==1)
          if(is.na(dw))
          {warning("no usable weather downloaded")
           return(NA)}
        colnames(dw)[which(colnames(dw)=="Average Air Temperature")]<-"Tmean"
        colnames(dw)[which(colnames(dw)=="Minimum Air Temperature")]<-"Tmin"
        colnames(dw)[which(colnames(dw)=="Maximum Air Temperature")]<-"Tmax"
        colnames(dw)[which(colnames(dw)=="Precipitation")]<-"Prec"
        if(drop_most) dw<-dw[,c("Year","Month","Day","Tmin","Tmax","Tmean","Prec")]
        for (cc in c("Year","Month","Day","Tmin","Tmax","Tmean","Prec"))
          dw[,cc]<-as.numeric(dw[,cc])
        return(list(database="CIMIS",weather=dw))}
  if(is.data.frame(action)) # then we assume that this is a downloaded file to be cleaned
        {dw<-action
        if(length(dw)==1)
          if(is.na(dw))
          {warning("no usable weather downloaded")
            return(NA)}
        colnames(dw)[which(colnames(dw)=="Average Air Temperature")]<-"Tmean"
        colnames(dw)[which(colnames(dw)=="Minimum Air Temperature")]<-"Tmin"
        colnames(dw)[which(colnames(dw)=="Maximum Air Temperature")]<-"Tmax"
        colnames(dw)[which(colnames(dw)=="Precipitation")]<-"Prec"
        if(drop_most) dw<-dw[,c("Year","Month","Day","Tmin","Tmax","Tmean","Prec")]
        for (cc in c("Year","Month","Day","Tmin","Tmax","Tmean","Prec"))
          dw[,cc]<-as.numeric(dw[,cc])
        return(dw)}

}

Try the chillR package in your browser

Any scripts or data that you put into this service are public.

chillR documentation built on Nov. 28, 2023, 1:09 a.m.