R/handle_ucipm.R

Defines functions handle_ucipm

Documented in handle_ucipm

#' List, download or convert to chillR format data from the UCIPM database
#'
#' This function can do three things related to the University of California
#' Integrated Pest Management (UCIPM) database: 1. it can list stations that
#' are close to a specified position (geographic coordinates) 2. it can
#' retrieve weather data for a named weather station 3. it can 'clean'
#' downloaded data, so that they can easily be used in chillR Which of these
#' functions is carried out depends on the action argument.
#'
#' This function can run independently, but it is also called by the
#' get_weather and weather2chillR functions, which some users might find a bit
#' easier to handle.
#'
#' the UCIPM dataset is described here: http://ipm.ucdavis.edu/WEATHER/
#'
#' under the 'list_stations' mode, several formats are possible for specifying
#' the location vector, which can consist of either two or three coordinates
#' (it can include elevation). Possible formats include c(1,2,3), c(1,2),
#' c(x=1,y=2,z=3), c(lat=2,long=1,elev=3). If elements of the vector are not
#' names, they are interpreted as c(Longitude, Latitude, Elevation).
#'
#' The 'chillRCode' is generated by this function, when it is run with
#' geographic coordinates as location inputs. In the list of nearby stations
#' that is returned then, the chillRCode is provided and can then be used as
#' input for running the function in 'downloading' mode. For downloading the
#' data, use the same call as before but replace the location argument with the
#' chillRCode.
#'
#' @param action if this is the character string "list_stations", the function
#' will return a list of the weather stations from the database that are
#' closest to the geographic coordinates specified by location.  if this is the
#' character string "download_weather", the function will attempt to download
#' weather data from the database for the station named by the location
#' argument, which should then be a character string corresponding to the
#' chillRcode of the station (which you can get by running this function in
#' 'list_stations mode) if this is a downloaded weather file (downloaded by
#' running this function in 'download weather' mode), the function cleans the
#' file and makes it ready for use in chillR. If the input is just a dataframe
#' (not a list, as produced with this function), you have to specify the
#' database name with the database argument
#' @param location either a vector of geographic coordinates (for the
#' 'list_stations' mode), or the 'chillRcode' of a weather station in the
#' specified database (for the 'download_weather' mode. When running this
#' function for data cleaning only, this is not needed.
#' @param time_interval numeric vector with two elements, specifying the start
#' and end date of the period of interest. Only required when running in
#' 'list_stations' or 'download weather' mode
#' @param station_list if the list of weather stations has already been
#' downloaded, the list can be passed to the function through this argument.
#' This can save a bit of time, since it can take a bit of time to download the
#' list, which can have several MB.
#' @param stations_to_choose_from if the location is specified by geographic
#' coordinates, this argument determines the number of nearby stations in the
#' list that is returned.
#' @param drop_most boolean variable indicating if most columns should be
#' dropped from the file. If set to TRUE (default), only essential columns for
#' running chillR functions are retained.
#' @param end_at_present boolean variable indicating whether the interval of
#' interest should end on the present day, rather than extending until the end
#' of the year specified under time_interval[2] (if time_interval[2] is the
#' current year).
#' @return The output depends on the action argument. If it is 'list_stations',
#' the function returns a list of station_to_choose_from weather stations that
#' are close to the specified location. This list also contains information
#' about how far away these stations are (in km), how much the elevation
#' difference is (if elevation is specified; in m) and how much overlap there
#' is between the data contained in the database and the time period specified
#' by time_interval. If action is 'download_weather' the output is a list of
#' two elements: 1. database="CIMIS" 2. the downloaded weather record, extended
#' to the full duration of the specified time interval. If action is a weather
#' data.frame or a weather record downloaded with this function (in
#' 'download_weather' mode), the output is the same data in a format that is
#' easy to use in chillR. If drop_most was set to TRUE, most columns are
#' dropped.
#' @note Many databases have data quality flags, which may sometimes indicate
#' that data aren't reliable. These are not considered by this function!
#'
#' The station list provided by the UC IPM database doesn't contain geographic
#' positions of the stations, which can only be accessed by station-specific
#' websites. This function will access this information only if it was not
#' given on the website in early 2016. Station information based on a download
#' at that time is stored in the california_station dataset included in chillR.
#' This was done to reduce the run time for the handle_ucipm function. It will
#' probably be okay for the foreseeable future (stations don't change very
#' quickly). A new version of this table can be produces with the
#' make_california_UCIPM_station_list() function.
#' @author Eike Luedeling
#' @references The chillR package:
#'
#' Luedeling E, Kunz A and Blanke M, 2013. Identification of chilling and heat
#' requirements of cherry trees - a statistical approach. International Journal
#' of Biometeorology 57,679-689.
#' @keywords utilities
#' @examples
#'
#' # All examples are disabled, because the database is sometimes unavailable. This then generates
#' # an error when R runs its package functionality checks. To run the examples, remove the # mark,
#' # before running the code.
#' #
#' #handle_ucipm(action="list_stations",location=c(x=-122,y=38.5),time_interval=c(2012,2012))
#' #gw<-handle_ucipm(action="download_weather",location="WINTERS.A",time_interval=c(2012,2012))
#' #weather<-handle_ucipm(gw)$weather
#' #make_chill_plot(tempResponse(stack_hourly_temps(fix_weather(weather)),Start_JDay=300,End_JDay=50),
#' #                "Chill_Portions",start_year=2010,end_year=2012,metriclabel="Chill Portions",
#' #                misstolerance = 50)
#'
#' @export handle_ucipm
handle_ucipm<-function(action,location=NA,time_interval=NA,station_list=california_stations,stations_to_choose_from=25,drop_most=TRUE,end_at_present = TRUE)
{
 #station list section

  if(is.character(action))  if(action=="list_stations")
        {if(!is.null(names(location)))
        {lat<-unlist(sapply(names(location),function(x) max(c(length(grep("lat", x, ignore.case = TRUE)),length(grep("y", x, ignore.case = TRUE))))))
        if(sum(lat)==1) lat<-as.numeric(location[which(lat==1)])
        long<-unlist(sapply(names(location),function(x) max(c(length(grep("lon", x, ignore.case = TRUE)),length(grep("x", x, ignore.case = TRUE))))))
        if(sum(long)==1) long<-as.numeric(location[which(long==1)])
        elev<-unlist(sapply(names(location),function(x) max(c(length(grep("ele", x, ignore.case = TRUE)),length(grep("alt", x, ignore.case = TRUE)),
                                                              length(grep("z", x, ignore.case = TRUE))))))
        if(sum(elev)==1) elev<-as.numeric(location[which(elev==1)]) else elev<-NA
        } else {long<-location[1]
        lat<-location[2]
        if(length(location)==3) elev<-location[3] else elev<-NA}

        if(is.null(station_list)) station_list<-data.frame(Code=NA,Lat=NA,Long=NA,Elev=NA)
        if (!is.null(station_list)) if(is.na(sum(match(c("Code","Lat","Long","Elev"),colnames(station_list)))))
           station_list<-data.frame(Code=NA,Lat=NA,Long=NA,Elev=NA)

        docu<-htmlParse("http://ipm.ucdavis.edu/WEATHER/wxactstnames.html")
        els = getNodeSet(docu, "//body//table")[[2]]
        els = getNodeSet(els, "//table//tr")
        nores=TRUE
        for(i in 1:length(els))
          {x<-xmlToDataFrame(els[i])
           if(length(x)==3)
             {colnames(x)<-c("Name","Code","Interval")
              if(nores) {res<-x;nores=FALSE} else res<-rbind(res,x)}}


      for(l in 1:nrow(res))
        {if(res$Code[l] %in% station_list$Code)
        {res[l,"Latitude"]<-station_list[which(station_list$Code==as.character(res$Code[l])),"Lat"]
        res[l,"Longitude"]<-station_list[which(station_list$Code==as.character(res$Code[l])),"Long"]
        res[l,"Elevation"]<-station_list[which(station_list$Code==as.character(res$Code[l])),"Elev"]
        } else
        {docu<-htmlParse(paste("http://ipm.ucdavis.edu/calludt.cgi/WXSTATIONDATA?STN=",res$Code[l],sep=""))
        els = getNodeSet(docu, "//table")[[2]]
        els = getNodeSet(els, "//tr")[[6]]
        positionstring<-getChildrenStrings(els)[1]
        suppressWarnings(sp<-as.numeric(strsplit(positionstring," ")$td))
        sp<-sp[which(!is.na(sp))]
        res[l,"Latitude"]<-sp[1]+sp[2]/60
        res[l,"Longitude"]<-sp[3]+sp[4]/60
        if(length(grep("min W",positionstring))>0) res[l,"Longitude"]<-(-res[l,"Longitude"])
        if(length(grep("min S",positionstring))>0) res[l,"Latitude"]<-(-res[l,"Latitude"])
        res[l,"Elevation"]<-as.numeric(strsplit(as.character(getChildrenStrings(els)[3])," ")[[1]][2])*0.3048}}

        starts<-sapply(sapply(res$Interval,function(x) strsplit(as.character(x)," to")),function(x) x[[1]])
        starts<-sapply(starts,function(x) strsplit(x,"/"))
        starts<-unlist(sapply(starts,function(x) if(length(x)==3) x<-format(YEARMODA2Date(as.numeric(x[3])*10000+as.numeric(x[1])*100+as.numeric(x[2])),"%Y-%m-%d") else
                                          if(length(x)==1) x<-format(YEARMODA2Date(as.numeric(x)*10000+0101),"%Y-%m-%d")))
        res["Start_date"]<-starts
        ends<-sapply(sapply(res$Interval,function(x) strsplit(as.character(x)," to")),function(x) x[[length(x)]])
        ends[c(1:length(ends)) %in% grep("period",res$Interval)]<-"present"
        ends[!c(1:length(ends)) %in% grep("present",res$Interval)]<-format(YEARMODA2Date(as.numeric(ends[!c(1:length(ends)) %in% grep("present",ends)])*10000+1231),"%Y-%m-%d")
        ends[c(1:length(ends)) %in% grep("present",res$Interval)]<-format(Sys.time(),"%Y-%m-%d")

        res["End_date"]<-ends

        myPoint<-c(long,lat)
        res[,"distance"]<-round(spDistsN1(as.matrix(res[,c("Longitude","Latitude")]), myPoint, longlat=TRUE),2)
        sorted_list<-res[order(res$distance),]
        if(!is.na(elev)) sorted_list[,"elevation_diff"]<-elev-sorted_list$Elevation
        sorted_list[,"chillR_code"]<-as.character(sorted_list$Code)
        if(!is.na(time_interval[1]))
            {interval_end<-YEARMODA2Date(time_interval[2]*10000+1231)
            if(end_at_present) interval_end<-min(interval_end,ISOdate(format(Sys.Date(),"%Y"),format(Sys.Date(),"%m"),
                   format(Sys.Date(),"%d")))
            interval_start<-YEARMODA2Date(time_interval[1]*10000+0101)
            sorted_list<-sorted_list[1:min(nrow(sorted_list),max(stations_to_choose_from,500)),]
            overlap_days<-apply(sorted_list,1,function (x) (as.numeric(difftime(
              sort(c(x["End_date"],format(interval_end,"%Y-%m-%d")))[1],
              sort(c(x["Start_date"],format(interval_start,"%Y-%m-%d")))[2])+1)))
            sorted_list[,"Overlap_years"]<-round(
              apply(sorted_list,1,function (x) (as.numeric(difftime(
                sort(c(x["End_date"],format(interval_end,"%Y-%m-%d")))[1],
                sort(c(x["Start_date"],format(interval_start,"%Y-%m-%d")))[2])+1)/(365+length(which(sapply(time_interval[1]:time_interval[2],leap_year)))/(time_interval[2]-time_interval[1]+1)))),2)
            sorted_list[which(sorted_list[,"Overlap_years"]<0),"Overlap_years"]<-0
            sorted_list[,"Perc_interval_covered"]<-round(overlap_days/as.numeric(interval_end-interval_start+1)*100,2)
            if(!is.na(elev))  sorted_list<-sorted_list[,c("chillR_code","Name","Latitude","Longitude","Elevation","Start_date","End_date",
                                                          "distance","elevation_diff","Overlap_years","Perc_interval_covered")] else
                                                            sorted_list<-sorted_list[,c("chillR_code","Name","Latitude","Longitude","Start_date","End_date",
                                                                                        "distance","Overlap_years","Perc_interval_covered")]} else
            if(!is.na(elev))  sorted_list<-sorted_list[,c("chillR_code","Name","Latitude","Longitude","Elevation","Start_date","End_date",
                                                      "distance","elevation_diff")] else
                                           sorted_list<-sorted_list[,c("chillR_code","Name","Latitude","Longitude","Start_date","End_date",
                                                                      "distance")]
        return(sorted_list[1:stations_to_choose_from,])}


  #weather download section

if(is.character(action)) if(action=="download_weather")
  {

  if(is.na(time_interval[1])) time_interval<-c(1950,2050)
  docu<-htmlParse("http://ipm.ucdavis.edu/WEATHER/wxactstnames.html")
  els = getNodeSet(docu, "//body//table")[[2]]
  els = getNodeSet(els, "//table//tr")
  nores=TRUE
  for(i in 1:length(els))
  {x<-xmlToDataFrame(els[i])
  if(length(x)==3)
  {colnames(x)<-c("Name","Code","Interval")
  if(nores) {res<-x;nores=FALSE} else res<-rbind(res,x)}}

  if(!location %in% res$Code) {warning("No weather data found for this station.")} else
  {
    string<-paste("STN=",location,"&MAP=&FROMMONTH=1&FROMDAY=1&FROMYEAR=",
                  time_interval[1],"&THRUMONTH=12&THRUDAY=31&THRUYEAR=",time_interval[2],"&DT_PRECIP=1&PRECIP_BACKUP1=.&PRECIP_BACKUP2=.&",
                  "PRECIP_BACKUPAVG=.&DT_AIR=1&AIR_BACKUP1=.&AIR_BACKUP2=.&AIR_BACKUPAVG=.&DT_SOIL=1&SOIL_BACKUP1=.&SOIL_BACKUP2=.&SOIL_BACKUPAVG=.&",
                  "DT_WIND=1&WIND_BACKUP1=.&WIND_BACKUP2=.&WIND_BACKUPAVG=.&DT_RH=1&RH_BACKUP1=.&RH_BACKUP2=.&RH_BACKUPAVG=.&DT_ET=1&ET_BACKUP1=.&",
                  "ET_BACKUP2=.&ET_BACKUPAVG=.&DT_SOLAR=1&SOLAR_BACKUP1=.&SOLAR_BACKUP2=.&SOLAR_BACKUPAVG=.&UNITS=M&FFMT=T&ACTION=RETRIEVE+DATA",sep="")

    dat<-POST(url="http://ipm.ucanr.edu/calludt.cgi/WXDATAREPORT",body=string)
    weather<-content(dat,"text",encoding="latin1")
    if(!is.na(weather)) record<-read.csv(textConnection(weather),allowEscapes = FALSE,skip=66,stringsAsFactors =FALSE) else
      record<- data.frame(Station=location,Date=c(paste(time_interval[1],"0101",sep=""),paste(time_interval[2],"1231",sep="")),Time=NA,Precip=NA,type=NA,Air.max=NA,min=NA,obs=NA,Wx=NA,Wind.dir=NA,
                          speed=NA,Bulb.wet=NA,dry=NA,Soil.max=NA,min.1=NA,Evap=NA,Solar=NA,ETo=NA,RH.max=NA,min.2=NA)

    record<-record[which(!is.na(record[,2])),]

   if(is.data.frame(record))
     {record[,"Year"]<-as.numeric(sapply(as.character(record$Date),function(x) substr(x,1,4)))
      record[,"Month"]<-as.numeric(sapply(as.character(record$Date),function(x) substr(x,5,6)))
      record[,"Day"]<-as.numeric(sapply(as.character(record$Date),function(x) substr(x,7,8)))
      record<-make_all_day_table(record,no_variable_check=TRUE)}}

 return(list(database="UCIPM",weather=record))}

  #weather cleaning section

  if(is.list(action)) if(names(action)[1]=="database") # then we assume that this is a downloaded file to be cleaned
        {dw<-action$weather
        colnames(dw)[which(colnames(dw)=="min")]<-"Tmin"
        colnames(dw)[which(colnames(dw)=="Air.max")]<-"Tmax"
        colnames(dw)[which(colnames(dw)=="Precip")]<-"Prec"
        if(drop_most) dw<-dw[,c("Year","Month","Day","Tmin","Tmax","Prec")]
        for (cc in c("Year","Month","Day","Tmin","Tmax","Prec"))
          dw[,cc]<-as.numeric(dw[,cc])
        return(list(database="UCIPM",weather=dw))}
  if(is.data.frame(action)) # then we assume that this is a downloaded file to be cleaned
        {dw<-action
        colnames(dw)[which(colnames(dw)=="min")]<-"Tmin"
        colnames(dw)[which(colnames(dw)=="Air.max")]<-"Tmax"
        colnames(dw)[which(colnames(dw)=="Precip")]<-"Prec"
        if(drop_most) dw<-dw[,c("Year","Month","Day","Tmin","Tmax","Prec")]
        for (cc in c("Year","Month","Day","Tmin","Tmax","Prec"))
          dw[,cc]<-as.numeric(dw[,cc])
        return(dw)}

}

Try the chillR package in your browser

Any scripts or data that you put into this service are public.

chillR documentation built on Jan. 11, 2022, 5:07 p.m.