Nothing
#' Deprecated version of handle_gsod. List, download or convert to chillR format data from the Global Summary of
#' the Day database
#'
#' @description \emph{This function is deprecated, but it will be retained for a
#' few generations of updates.} Its functionality has been fully replaced by the
#' new version of the \code{\link{handle_gsod}} function, which does the same
#' job, but much faster. That's probably the function you really want to use.
#'
#' This function can do three things related to the Global Summary of the Day
#' ("GSOD") database from the National Climatic Data Centre (NCDC) of the
#' National Oceanic and Atmospheric Administration (NOAA): \itemize{
#'
#' \item{1. It can list stations that are close to a specified position (geographic coordinates).}
#'
#' \item{2. It can retrieve weather data for a named weather station (or a vector of multiple stations).
#' For the name, the chillRcode from the list returned by the \code{list_stations} operation
#' should be used.}
#'
#' \item{3. It can 'clean' downloaded data (for one or multiple stations), so that they can easily be used in chillR
#'
#' Which of these functions is carried out depends on the action argument.}}
#'
#' This function can run independently, but it is also called by the
#' \code{\link{get_weather}} and \code{\link{weather2chillR}} functions, which some users might find a bit
#' easier to handle.
#'
#' @details The GSOD database is described here:
#' \url{https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.ncdc:C00516}
#'
#' under the \code{'list_stations'} mode, several formats are possible for specifying
#' the location vector, which can consist of either two or three coordinates
#' (it can include elevation). Possible formats include \code{c(1, 2, 3)}, \code{c(1, 2)},
#' \code{c(x = 1, y = 2, z = 3)}, \code{c(lat = 2, long = 1, elev = 3)}. If elements of the vector are not
#' names, they are interpreted as c(Longitude, Latitude, Elevation).
#'
#' The 'chillRCode' is generated by this function, when it is run with
#' geographic coordinates as location inputs. In the list of nearby stations
#' that is returned then, the chillRCode is provided and can then be used as
#' input for running the function in 'downloading' mode. For downloading the
#' data, use the same call as before but replace the location argument with the
#' chillRCode.
#'
#' @param action accepts 3 types of inputs to decide on the mode of action for the function.\itemize{
#'
#' \item{if this is the character string \code{"list_stations"}, the function
#' will return a list of the weather stations from the database that are
#' closest to the geographic coordinates specified by location.}
#'
#' \item{if this is the character string \code{"download_weather"}, the function will attempt to download
#' weather data from the database for the station named by the location
#' argument, which should then be a character string corresponding to the
#' \code{chillRcode} of the station (which you can get by running this function in
#' \code{'list_stations'} mode).}
#'
#' \item{if this is a collection of outputs obtained by
#' running this function in the \code{'download weather'} mode), the function cleans the
#' weather files and make them ready for use in \code{chillR}. If the input is just a dataframe
#' (not a list, as produced with this function), you have to specify the
#' database name with the database argument.}}
#'
#' @param location either a vector of geographic coordinates (for the
#' \code{'list_stations'} mode), or the 'chillRcode' of a weather station in the
#' specified database (for the \code{'download_weather'} mode). When running this
#' function for data cleaning only, this is not needed. For the
#' \code{'download_weather'} mode, this can also be a vector of 'chillRcodes',
#' in which case records for all stations will be downloaded. The data cleaning
#' mode can also handle a list of downloaded weather datasets.
#'
#' @param time_interval numeric vector with two elements, specifying the start
#' and end date of the period of interest. Only required when running in
#' \code{'list_stations'} or \code{'download_weather'} mode.
#'
#' @param station_list if the list of weather stations has already been
#' downloaded, the list can be passed to the function through this argument.
#' This can save a bit of time, since it can take a bit of time to download the
#' list, which can have several MB.
#'
#' @param stations_to_choose_from if the location is specified by geographic
#' coordinates, this argument determines the number of nearby stations in the
#' list that is returned.
#'
#' @param drop_most boolean variable indicating if most columns should be
#' dropped from the file. If set to \code{TRUE} (default), only essential columns for
#' running chillR functions are retained.
#'
#' @param end_at_present boolean variable indicating whether the interval of
#' interest should end on the present day, rather than extending until the end
#' of the year specified under \code{time_interval[2]} (if \code{time_interval[2]} is the
#' current year).
#'
#' @param add.DATE is a boolean parameter to be passed to \code{\link{make_all_day_table}} if \code{action} is
#' a collection of outputs (in the form of list) from the function in the downloading format.
#'
#' @param quiet is a boolean parameter to be passed to \code{\link[utils:download.file]{download.file}} if
#' \code{action = "download_weather"}.
#'
#' @param add_station_name is a boolean parameter to include the name of the respective weather station in
#' the resulting data frame in case the function is used in the downloading or formatting mode.
#'
#' @return The output depends on the action argument. If it is \code{'list_stations'},
#' the function returns a list of \code{station_to_choose_from} weather stations that
#' are close to the specified location. This list also contains information
#' about how far away these stations are (in km), how much the elevation
#' difference is (if elevation is specified; in m) and how much overlap there
#' is between the data contained in the database and the time period specified
#' by \code{time_interval}. If action is \code{'download_weather'} the output is a list of
#' two elements: 1. \code{database="GSOD"} 2. the downloaded weather record, extended
#' to the full duration of the specified time interval. If the \code{location} input
#' was a vector of stations, the output will be a list of such objects.
#' If action is a weather \code{data.frame} or a weather record downloaded with
#' this function (in \code{'download_weather'} mode), the output is the same
#' data in a format that is easy to use in chillR. If drop_most was set to
#' \code{TRUE}, most columns are dropped. If the \code{location} input was a
#' list of weather datasets, all elements of the list will be processed.
#'
#' @note Many databases have data quality flags, which may sometimes indicate
#' that data aren't reliable. These are not considered by this function!
#'
#' For many places, the GSOD database is quite patchy, and the length of the
#' record indicated in the summary file isn't always very useful (e.g. there
#' could only be two records for the first and last date). Files are downloaded
#' by year, so if we specify a long interval, this may take a bit of time.
#'
#' @importFrom R.utils gunzip
#'
#' @author Eike Luedeling and Eduardo Fernandez
#' @references The chillR package:
#'
#' Luedeling E, Kunz A and Blanke M, 2013. Identification of chilling and heat
#' requirements of cherry trees - a statistical approach. International Journal
#' of Biometeorology 57,679-689.
#' @keywords utilities
#' @examples
#'
#' # List the near weather stations
#' # stat_list <- handle_gsod_old(action = "list_stations",
#' # location = c(x = -122, y = 38.5),
#' # time_interval = c(2002, 2002))
#'
#' # the line above takes longer to run than CRAN allows for examples.
#' # The line below therefore
#' # generates an abbreviated stat_list that allows running the code.
#'
#' # stat_list <- data.frame(chillR_code = c("724828_99999",
#' # "724828_93241",
#' # "720576_174"),
#' # STATION.NAME = c("NUT TREE",
#' # "NUT TREE AIRPORT",
#' # "UNIVERSITY AIRPORT"),
#' # Lat = c(38.383, 38.378, 38.533),
#' # Long = c(-121.967, -121.958, -121.783),
#' # BEGIN = c(20010811, 20060101, 20130101),
#' # END = c(20051231, 20160110, 20160109))
#'
#' # gw <- handle_gsod_old(action = "download_weather",
#' # location = "724828_93241",
#' # time_interval = c(2010, 2012),
#' # station_list = stat_list,
#' # quiet = TRUE)
#'
#' # weather <- handle_gsod_old(gw, add.DATE = FALSE)[[1]]$weather
#'
#' # make_chill_plot(tempResponse(stack_hourly_temps(fix_weather(weather)),
#' # Start_JDay = 300, End_JDay = 50),
#' # "Chill_Portions", start_year = 2010,
#' # end_year = 2012, metriclabel = "Chill Portions",
#' # misstolerance = 50)
#'
#' @export handle_gsod_old
handle_gsod_old <- function(action,
location = NA,
time_interval = NA,
station_list = NULL,
stations_to_choose_from = 25,
drop_most = TRUE,
end_at_present = TRUE,
add.DATE = TRUE,
quiet = FALSE,
add_station_name = FALSE){
# Internal function to remove missing values
remove_missing <- function(tab, column, missing){
tab[which(tab[, column] == missing), column] <- NA
return(tab)
}
if(is.character(action)) if(action=="list_stations")
{
if(!is.null(names(location)))
{lat<-unlist(sapply(names(location),function(x) max(c(length(grep("lat", x, ignore.case = TRUE)),length(grep("y", x, ignore.case = TRUE))))))
if(sum(lat)==1) lat<-as.numeric(location[which(lat==1)])
long<-unlist(sapply(names(location),function(x) max(c(length(grep("lon", x, ignore.case = TRUE)),length(grep("x", x, ignore.case = TRUE))))))
if(sum(long)==1) long<-as.numeric(location[which(long==1)])
elev<-unlist(sapply(names(location),function(x) max(c(length(grep("ele", x, ignore.case = TRUE)),length(grep("alt", x, ignore.case = TRUE)),
length(grep("z", x, ignore.case = TRUE))))))
if(sum(elev)==1) elev<-as.numeric(location[which(elev==1)]) else elev<-NA
} else {long<-location[1]
lat<-location[2]
if(length(location)==3) elev<-location[3] else elev<-NA}
if(is.null(station_list)) station_list<-read.csv("ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-history.csv")
stat_list<-station_list
cn<-colnames(stat_list)
if (is.null(cn)|!((("USAF" %in% cn & "WBAN" %in% cn)|"chillR_code" %in% cn) &
"STATION.NAME" %in% cn & "CTRY" %in% cn & "BEGIN" %in% cn & "END" %in% cn &
("LON" %in% cn|"Long" %in% cn) & ("LAT" %in% cn|"Lat" %in% cn)))
{warning("Station list not according to expectations. Downloading list from NCDC.")
stat_list<-read.csv("ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-history.csv")}
colnames(stat_list)[which(colnames(stat_list) %in% c("LON","Long"))]<-"Long"
colnames(stat_list)[which(colnames(stat_list) %in% c("LAT","Lat"))]<-"Lat"
colnames(stat_list)[which(colnames(stat_list) %in% c("ELEV.M.","Elev"))]<-"Elev"
stat_list<-stat_list[which(!is.na(stat_list$Lat)&(!is.na(stat_list$Long))),]
lat_rad <- lat*pi/180
lon_rad <- long*pi/180
lat_rad_stat <- stat_list$Lat*pi/180
lon_rad_stat <- stat_list$Long*pi/180
stat_list[,"distance"]<-
round(6378.388 * acos(sin(lat_rad) * sin(lat_rad_stat) +
cos(lat_rad) * cos(lat_rad_stat) *
cos(lon_rad_stat - lon_rad)),
2)
sorted_list<-stat_list[order(stat_list$distance),]
if(!is.na(elev)) sorted_list[,"elevation_diff"]<-elev-sorted_list$Elev
sorted_list<-sorted_list[1:max(stations_to_choose_from,500),]
if(!"chillR_code" %in% colnames(sorted_list)) sorted_list[,"chillR_code"]<-paste(sorted_list$USAF,"_",sorted_list$WBAN,sep="")
sorted_list[,"Start_date"]<-YEARMODA2Date(sorted_list$BEGIN)
sorted_list[,"End_date"]<-YEARMODA2Date(sorted_list$END)
if(!is.na(time_interval[1]))
{
interval_end<-YEARMODA2Date(time_interval[2]*10000+1231)
interval_start<-YEARMODA2Date(time_interval[1]*10000+0101)
if(end_at_present) interval_end<-min(interval_end,ISOdate(format(Sys.Date(),"%Y"),format(Sys.Date(),"%m"),
format(Sys.Date(),"%d")))
overlap_days<-apply(sorted_list,1,function (x) (as.numeric(difftime(
sort(c(x["End_date"],format(interval_end,"%Y-%m-%d")))[1],
sort(c(x["Start_date"],format(interval_start,"%Y-%m-%d")))[2])+1)))
sorted_list[,"Overlap_years"]<-round(
apply(sorted_list,1,function (x) (as.numeric(difftime(
sort(c(x["End_date"],format(interval_end,"%Y-%m-%d")))[1],
sort(c(x["Start_date"],format(interval_start,"%Y-%m-%d")))[2])+1)/(365+length(which(sapply(time_interval[1]:time_interval[2],leap_year)))/(time_interval[2]-time_interval[1]+1)))),2)
sorted_list[which(sorted_list[,"Overlap_years"]<0),"Overlap_years"]<-0
overlap_days[which(overlap_days<0)]<-0
sorted_list[,"Perc_interval_covered"]<-round(overlap_days/as.numeric(interval_end-interval_start+1)*100,2)
if(!is.na(elev)) sorted_list<-sorted_list[,c("chillR_code","STATION.NAME","CTRY","Lat","Long","Elev","BEGIN","END",
"distance","elevation_diff","Overlap_years","Perc_interval_covered")] else
sorted_list<-sorted_list[,c("chillR_code","STATION.NAME","CTRY","Lat","Long","BEGIN","END",
"distance","Overlap_years","Perc_interval_covered")]} else
if(!is.na(elev)) sorted_list<-sorted_list[,c("chillR_code","STATION.NAME","CTRY","Lat","Long","Elev","BEGIN","END",
"distance","elevation_diff")] else
sorted_list<-sorted_list[,c("chillR_code","STATION.NAME","CTRY","Lat","Long","BEGIN","END",
"distance")]
return(sorted_list[1:stations_to_choose_from,])}
if(is.character(action)) if(action=="download_weather")
{
if(is.null(station_list)) station_list<-read.csv("ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-history.csv")
stat_list<-station_list
cn<-colnames(stat_list)
if (is.null(cn)|!((("USAF" %in% cn & "WBAN" %in% cn)|"chillR_code" %in% cn) &
"BEGIN" %in% cn & "END" %in% cn &
("LON" %in% cn|"Long" %in% cn) & ("LAT" %in% cn|"Lat" %in% cn)))
{warning("Station list not according to expectations. Downloading list from NCDC.")
stat_list<-read.csv("ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-history.csv")}
colnames(stat_list)[which(colnames(stat_list) %in% c("LON","Long"))]<-"Long"
colnames(stat_list)[which(colnames(stat_list) %in% c("LAT","Lat"))]<-"Lat"
colnames(stat_list)[which(colnames(stat_list) %in% c("ELEV.M.","Elev"))]<-"Elev"
stat_list<-stat_list[which(!is.na(stat_list$Lat)&(!is.na(stat_list$Long))),]
if (!"chillR_code" %in% colnames(stat_list)) stat_list[,"chillR_code"]<-paste(stat_list$USAF,"_",stat_list$WBAN,sep="")
# Define the locations
Locations <- as.character(location)
# Collect the output in a list
outputs <- list()
# Vectorize the downloading
for (Location in Locations){
if(Location %in% stat_list$chillR_code)
{
#download data
STN<-strsplit(Location,"_")[[1]][1]
WBAN<-strsplit(Location,"_")[[1]][2]
startlisting<-TRUE
for(y in c(time_interval[1]:time_interval[2]))
{
URL<-paste("ftp://ftp.ncdc.noaa.gov/pub/data/gsod/",y,"/",STN,"-",WBAN,"-",y,".op.gz",sep="")
suppressWarnings(dir.create("chillRtempdirectory"))
dest<-"chillRtempdirectory/weather.op.gz"
ff<-suppressWarnings(try(download.file(URL, dest, quiet = quiet),silent = TRUE))
if(ff==0)
{gz <- R.utils::gunzip(dest) #gzfile(dest, open = "rt")
out<-suppressWarnings(read.fwf(gz, c(6,1,5,2,4,2,2,2,6,1,2,2,6,1,2,2,6,1,2,2,6,1,2,2,5,1,2,2,5,1,2,2,5,2,5,2,6,1,1,6,1,1,5,1,1,5,2,6), header = FALSE, sep = "\t",skip=1))
unlink(gz, recursive = TRUE)
colnames(out)<-c("STN---","space1","WBAN","space2","YEAR","MONTH","DAY","space3","TEMP","space4","Count1","space5","DEWP","space6","Count2",
"space7","SLP","space8","Count3","space9","STP","space10","Count4","space11","VISIB","space12","Count5","space13",
"WDSP","space14","Count6","space15","MXSPD","space16","GUST","space17","MAX","MaxFlag","space18",
"MIN","MinFlag","space19","PRCP","PrcpFlag","space20","SNDP","space21","FRSHTT")
out<-out[,c("STN---","WBAN","YEAR","MONTH","DAY","TEMP","Count1","DEWP","Count2","SLP","Count3","STP","Count4","VISIB","Count5",
"WDSP","Count6","MXSPD","GUST","MAX","MaxFlag","MIN","MinFlag","PRCP","PrcpFlag","SNDP","FRSHTT")]
out<-remove_missing(out,"DEWP",9999.9)
out<-remove_missing(out,"SLP",9999.9)
out<-remove_missing(out,"STP",9999.9)
out<-remove_missing(out,"VISIB",999.9)
out<-remove_missing(out,"WDSP",999.9)
out<-remove_missing(out,"MXSPD",999.9)
out<-remove_missing(out,"GUST",999.9)
out<-remove_missing(out,"MAX",9999.9)
out<-remove_missing(out,"MIN",9999.9)
out<-remove_missing(out,"PRCP",99.99)
out<-remove_missing(out,"SNDP",999.9)
out$TEMP<-(out$TEMP-32)/9*5
out$MIN<-(out$MIN-32)/9*5
out$MAX<-(out$MAX-32)/9*5
out$PRCP<-out$PRCP*25.4
if(startlisting) record<-out else record<-rbind(record,out)
startlisting<-FALSE}
}
if(startlisting) {record<-NA
outputs[[Location]] <- list(database = "GSOD", weather = record)
names(outputs)[which(names(outputs) == Location)] <- stat_list[stat_list$chillR_code == Location,
"STATION.NAME"]
warning(paste("No weather data found for the time interval of interest for",
stat_list[stat_list$chillR_code == Location, "STATION.NAME"]), call. = FALSE)
next}
record<-record[which(record$YEAR %in% c(time_interval[1]:time_interval[2])),]
first_line<-record[1,]
if(!((as.numeric(first_line["YEAR"])*10000+as.numeric(first_line["MONTH"])*100+as.numeric(first_line["DAY"]))==time_interval[1]*10000+101))
{first_line[,]<-NA
first_line[,c("STN---","WBAN","YEAR","MONTH","DAY")]<-c(STN,WBAN,time_interval[1],1,1)
record<-rbind(first_line,record)}
last_line<-record[nrow(record),]
if(!((as.numeric(first_line["YEAR"])*10000+as.numeric(first_line["MONTH"])*100+as.numeric(first_line["DAY"]))==time_interval[2]*10000+1231))
{last_line[,]<-NA
last_line[,c("STN---","WBAN","YEAR","MONTH","DAY")]<-c(STN,WBAN,time_interval[2],12,31)
record<-rbind(record,last_line)}
record[,"Year"]<-as.numeric(record[,"YEAR"])
record[,"Month"]<-as.numeric(record[,"MONTH"])
record[,"Day"]<-as.numeric(record[,"DAY"])
# add the station name
if (add_station_name)
record["STATION.NAME"] <- stat_list[stat_list$chillR_code == Location, "STATION.NAME"]
record<-make_all_day_table(record,no_variable_check=TRUE, add.DATE = add.DATE)
} else {warning("location does not match a record in the database. No records retrieved.",
call. = FALSE)}
outputs[[Location]] <- list(database = "GSOD", weather = record)
names(outputs)[which(names(outputs) == Location)] <- stat_list[stat_list$chillR_code == Location,
"STATION.NAME"]
}
return(outputs)
}
# Small chunk to clean a list of two elements (database and weather)
single_station <- FALSE
if(length(names(action)) == 2 & all(c("database", "weather") %in% names(action))){
single_station <- TRUE
action <- list(`station 1` = action)
if(add_station_name)
warning(paste("Your input list doesn't have a proper station name. The placeholder",
"'station 1' is used instead. Set 'add_station_name' to FALSE to remove it",
"or name your list using list('station_name' = your_input_list)."),
call. = FALSE)
}
if(is.list(action)) if(all(c("weather", "database") %in% unlist(lapply(action, names)))){ # then we assume that this is a downloaded file to be cleaned
# This step is to check if all elements of the list contain weather data
with_data <- which(unlist(lapply(action, function (x) is.data.frame(x[["weather"]]))))
action_rm_na <- action[with_data]
# Create a list to collect the outputs
outputs <- list()
for (i in 1 : length(action_rm_na)){
dw <- action_rm_na[[i]]$weather
colnames(dw)[which(colnames(dw)=="TEMP")]<-"Tmean"
colnames(dw)[which(colnames(dw)=="MIN")]<-"Tmin"
colnames(dw)[which(colnames(dw)=="MAX")]<-"Tmax"
colnames(dw)[which(colnames(dw)=="PRCP")]<-"Prec"
colnames(dw)[which(colnames(dw)=="YEAR")]<-"Year"
colnames(dw)[which(colnames(dw)=="MONTH")]<-"Month"
colnames(dw)[which(colnames(dw)=="DAY")]<-"Day"
if(drop_most) dw<-dw[,c("Year","Month","Day","Tmin","Tmax","Tmean","Prec")]
for (cc in c("Year","Month","Day","Tmin","Tmax","Tmean","Prec"))
dw[,cc]<-as.numeric(dw[,cc])
if ("DATE" %in% colnames(dw)) add.DATE <- FALSE
dw <- make_all_day_table(dw, add.DATE = add.DATE)
if (add_station_name) dw <- data.frame(Station_name = names(action_rm_na)[i],
dw)
outputs[[i]] <- list(database = "GSOD",
weather = dw)
names(outputs)[i] <- names(action_rm_na)[i]
}
if(single_station) outputs <- outputs[[1]]
return(outputs)}
if(is.data.frame(action)) # then we assume that this is a downloaded file to be cleaned
{dw<-action
colnames(dw)[which(colnames(dw)=="TEMP")]<-"Tmean"
colnames(dw)[which(colnames(dw)=="MIN")]<-"Tmin"
colnames(dw)[which(colnames(dw)=="MAX")]<-"Tmax"
colnames(dw)[which(colnames(dw)=="PRCP")]<-"Prec"
colnames(dw)[which(colnames(dw)=="YEAR")]<-"Year"
colnames(dw)[which(colnames(dw)=="MONTH")]<-"Month"
colnames(dw)[which(colnames(dw)=="DAY")]<-"Day"
if(drop_most) dw<-dw[,c("Year","Month","Day","Tmin","Tmax","Tmean","Prec")]
for (cc in c("Year","Month","Day","Tmin","Tmax","Tmean","Prec"))
dw[,cc]<-as.numeric(dw[,cc])
dw <- make_all_day_table(dw, no_variable_check = TRUE, add.DATE = add.DATE)
return(dw)}
}
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.