GSODR

Introduction

The GSOD or Global Surface Summary of the Day (GSOD) data provided by the US National Centers for Environmental Information (NCEI) are a valuable source of weather data with global coverage. However, the data files are cumbersome and difficult to work with. GSODR aims to make it easy to find, transfer and format the data you need for use in analysis and provides four main functions for facilitating this:

When reformatting data either with get_GSOD() or reformat_GSOD(), all units are converted from United States Customary System (USCS) to International System of Units (SI), e.g., inches to millimetres and Fahrenheit to Celsius. Data in the R session summarise each year by station, which also includes vapour pressure and relative humidity elements calculated from existing data in GSOD.

For more information see the description of the data provided by NCEI, https://www.ncei.noaa.gov/data/global-summary-of-the-day/doc/readme.txt.

Using get_GSOD()

Find Stations in or near Toowoomba, Queensland, Australia

GSODR provides lists of weather station locations and elevation values. It's easy to find all stations in Australia.

library("GSODR")

load(system.file("extdata", "isd_history.rda", package = "GSODR"))

# create data.frame for Australia only
Oz <- subset(isd_history, COUNTRY_NAME == "AUSTRALIA")

Oz
##              STNID                         NAME     LAT     LON ELEV(M) CTRY STATE    BEGIN      END
##    1: 695023-99999          HORN ISLAND   (HID) -10.583 142.300      NA   AS       19420804 20030816
##    2: 749430-99999           AIDELAIDE RIVER SE -13.300 131.133   131.0   AS       19430228 19440821
##    3: 749432-99999    BATCHELOR FIELD AUSTRALIA -13.049 131.066   107.0   AS       19421231 19430610
##    4: 749438-99999         IRON RANGE AUSTRALIA -12.700 143.300    18.0   AS       19420917 19440930
##    5: 749439-99999     MAREEBA AS/HOEVETT FIELD -17.050 145.400   443.0   AS       19420630 19440630
##   ---                                                                                               
## 1416: 959890-99999      BICHENO (COUNCIL DEPOT) -41.867 148.300    11.0   AS       19650101 20230816
## 1417: 959950-99999 LORD HOWE ISLAND WINDY POINT -31.533 159.067     4.0   AS       20120920 20230817
## 1418: 959970-99999    HEARD ISLAND (ATLAS COVE) -53.017  73.400     4.0   AS       19980301 20121220
## 1419: 996600-99999          ENVIRONM BUOY 55011 -40.800 144.300     0.0   AS       19930221 19970403
## 1420: 999999-82101               NORTHWEST CAPE -22.333 114.050    38.1   AS       19680305 19680430
##       COUNTRY_NAME ISO2C ISO3C
##    1:    AUSTRALIA    AU   AUS
##    2:    AUSTRALIA    AU   AUS
##    3:    AUSTRALIA    AU   AUS
##    4:    AUSTRALIA    AU   AUS
##    5:    AUSTRALIA    AU   AUS
##   ---                         
## 1416:    AUSTRALIA    AU   AUS
## 1417:    AUSTRALIA    AU   AUS
## 1418:    AUSTRALIA    AU   AUS
## 1419:    AUSTRALIA    AU   AUS
## 1420:    AUSTRALIA    AU   AUS
# Look for a specific town in Australia
subset(Oz, grepl("TOOWOOMBA", NAME))
##           STNID              NAME     LAT     LON ELEV(M) CTRY STATE    BEGIN      END COUNTRY_NAME
## 1: 945510-99999         TOOWOOMBA -27.583 151.933     676   AS       19561231 19971231    AUSTRALIA
## 2: 955510-99999 TOOWOOMBA AIRPORT -27.550 151.917     642   AS       19980301 20230817    AUSTRALIA
##    ISO2C ISO3C
## 1:    AU   AUS
## 2:    AU   AUS

Download a Single Station and Year Using get_GSOD()

Now that we've seen where the reporting stations are located, we can download weather data from the station Toowoomba, Queensland, Australia for 2010 by using the STNID in the station parameter of get_GSOD().

tbar <- get_GSOD(years = 2010, station = "955510-99999")
str(tbar)
## Classes 'data.table' and 'data.frame':   365 obs. of  47 variables:
##  $ STNID           : chr  "955510-99999" "955510-99999" "955510-99999" "955510-99999" ...
##  $ NAME            : chr  "TOOWOOMBA AIRPORT" "TOOWOOMBA AIRPORT" "TOOWOOMBA AIRPORT" "TOOWOOMBA AIRPORT" ...
##  $ CTRY            : chr  "AS" "AS" "AS" "AS" ...
##  $ COUNTRY_NAME    : chr  "AUSTRALIA" "AUSTRALIA" "AUSTRALIA" "AUSTRALIA" ...
##  $ ISO2C           : chr  "AU" "AU" "AU" "AU" ...
##  $ ISO3C           : chr  "AUS" "AUS" "AUS" "AUS" ...
##  $ STATE           : chr  "" "" "" "" ...
##  $ LATITUDE        : num  -27.6 -27.6 -27.6 -27.6 -27.6 ...
##  $ LONGITUDE       : num  152 152 152 152 152 ...
##  $ ELEVATION       : num  642 642 642 642 642 642 642 642 642 642 ...
##  $ BEGIN           : int  19980301 19980301 19980301 19980301 19980301 19980301 19980301 19980301 19980301 19980301 ...
##  $ END             : int  20230817 20230817 20230817 20230817 20230817 20230817 20230817 20230817 20230817 20230817 ...
##  $ YEARMODA        : Date, format: "2010-01-01" "2010-01-02" "2010-01-03" ...
##  $ YEAR            : int  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##  $ MONTH           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ DAY             : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ YDAY            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ TEMP            : num  21.2 23.2 21.4 18.9 20.5 21.9 21.3 20.9 21.9 22.3 ...
##  $ TEMP_ATTRIBUTES : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ DEWP            : num  17.9 19.4 18.9 16.4 16.4 18.7 17.4 17.1 16.2 14.9 ...
##  $ DEWP_ATTRIBUTES : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ SLP             : num  1013 1010 1012 1016 1016 ...
##  $ SLP_ATTRIBUTES  : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ STP             : num  942 939 941 944 944 ...
##  $ STP_ATTRIBUTES  : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ VISIB           : num  NA NA 14.3 23.3 NA NA NA NA NA NA ...
##  $ VISIB_ATTRIBUTES: int  0 0 6 4 0 0 0 0 0 0 ...
##  $ WDSP            : num  4.3 3.7 7.6 8.7 7.5 6.3 7.8 7.5 6.8 6.3 ...
##  $ WDSP_ATTRIBUTES : int  8 8 8 8 8 8 8 8 8 8 ...
##  $ MXSPD           : num  6.7 5.1 10.3 10.3 10.8 7.7 8.7 8.7 8.2 7.2 ...
##  $ GUST            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MAX             : num  25.8 26.5 28.7 24.1 24.6 26.8 26.1 26.5 27.4 28.7 ...
##  $ MAX_ATTRIBUTES  : chr  NA NA NA NA ...
##  $ MIN             : num  17.8 19.1 19.3 16.9 16.7 17.5 19.1 18.5 17.8 17.7 ...
##  $ MIN_ATTRIBUTES  : chr  NA NA "*" "*" ...
##  $ PRCP            : num  1.52 0.25 19.81 1.02 0.25 ...
##  $ PRCP_ATTRIBUTES : chr  "G" "G" "G" "G" ...
##  $ SNDP            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ I_FOG           : num  0 0 1 0 0 1 1 0 1 1 ...
##  $ I_RAIN_DRIZZLE  : num  0 0 1 0 0 0 0 0 0 0 ...
##  $ I_SNOW_ICE      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ I_HAIL          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ I_THUNDER       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ I_TORNADO_FUNNEL: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ EA              : num  2 2.2 2.2 1.9 1.9 2.2 2 1.9 1.8 1.7 ...
##  $ ES              : num  2.5 2.8 2.5 2.2 2.4 2.6 2.5 2.5 2.6 2.7 ...
##  $ RH              : num  81.5 79.2 85.7 85.4 77.3 82.1 78.5 78.9 70.1 62.9 ...
##  - attr(*, ".internal.selfref")=<externalptr>

Using nearest_stations() to Download Multiple Stations at Once

Using the nearest_stations() function, you can find stations closest to a given point specified by latitude and longitude in decimal degrees. This can be used to generate a vector to pass along to get_GSOD() and download the stations of interest.

tbar_stations <- nearest_stations(LAT = -27.5598,
                                  LON = 151.9507,
                                  distance = 50)

tbar <- get_GSOD(years = 2010, station = tbar_stations)
## Warning: 
## This station, 945510-99999, only provides data for years 1956 to 1997.
## Please send a request that falls within these years.
## Warning: 
## This station, 949999-00170, only provides data for years 1971 to 1984.
## Please send a request that falls within these years.
## Warning: 
## This station, 949999-00183, only provides data for years 1983 to 1984.
## Please send a request that falls within these years.
str(tbar)
## Classes 'data.table' and 'data.frame':   1095 obs. of  47 variables:
##  $ STNID           : chr  "945520-99999" "945520-99999" "945520-99999" "945520-99999" ...
##  $ NAME            : chr  "OAKEY" "OAKEY" "OAKEY" "OAKEY" ...
##  $ CTRY            : chr  "AS" "AS" "AS" "AS" ...
##  $ COUNTRY_NAME    : chr  "AUSTRALIA" "AUSTRALIA" "AUSTRALIA" "AUSTRALIA" ...
##  $ ISO2C           : chr  "AU" "AU" "AU" "AU" ...
##  $ ISO3C           : chr  "AUS" "AUS" "AUS" "AUS" ...
##  $ STATE           : chr  "" "" "" "" ...
##  $ LATITUDE        : num  -27.4 -27.4 -27.4 -27.4 -27.4 ...
##  $ LONGITUDE       : num  152 152 152 152 152 ...
##  $ ELEVATION       : num  407 407 407 407 407 ...
##  $ BEGIN           : int  19730430 19730430 19730430 19730430 19730430 19730430 19730430 19730430 19730430 19730430 ...
##  $ END             : int  20230817 20230817 20230817 20230817 20230817 20230817 20230817 20230817 20230817 20230817 ...
##  $ YEARMODA        : Date, format: "2010-01-01" "2010-01-02" "2010-01-03" ...
##  $ YEAR            : int  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##  $ MONTH           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ DAY             : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ YDAY            : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ TEMP            : num  23.4 26.2 24.5 21.6 22.6 24.7 24 23.3 24.4 25.1 ...
##  $ TEMP_ATTRIBUTES : int  16 16 16 16 16 16 16 16 16 16 ...
##  $ DEWP            : num  18.4 19.4 19.4 16.8 16.9 18.7 17.1 17.1 15.7 13.6 ...
##  $ DEWP_ATTRIBUTES : int  16 16 16 16 16 16 16 16 16 16 ...
##  $ SLP             : num  1012 1009 1011 1015 1015 ...
##  $ SLP_ATTRIBUTES  : int  16 16 16 16 16 16 16 16 16 16 ...
##  $ STP             : num  967 964 966 969 969 ...
##  $ STP_ATTRIBUTES  : int  16 16 16 16 16 16 16 16 16 16 ...
##  $ VISIB           : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ VISIB_ATTRIBUTES: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ WDSP            : num  4.3 4.1 6.1 7.5 4.4 4.3 5.8 6.2 5.6 4.5 ...
##  $ WDSP_ATTRIBUTES : int  16 16 16 16 16 16 16 16 16 16 ...
##  $ MXSPD           : num  7.2 6.2 8.7 9.8 7.7 6.2 8.2 9.3 7.7 7.2 ...
##  $ GUST            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ MAX             : num  28.5 31.2 33.6 27.1 27.8 30.4 30 30.5 31.9 33.2 ...
##  $ MAX_ATTRIBUTES  : chr  NA NA NA NA ...
##  $ MIN             : num  19.5 20.5 21.3 18.8 18.4 18.6 20.6 18.6 17.2 16.2 ...
##  $ MIN_ATTRIBUTES  : chr  NA NA "*" "*" ...
##  $ PRCP            : num  0.51 0 3.3 0 0 0 0 0.25 0 0 ...
##  $ PRCP_ATTRIBUTES : chr  "G" "G" "G" "G" ...
##  $ SNDP            : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ I_FOG           : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ I_RAIN_DRIZZLE  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ I_SNOW_ICE      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ I_HAIL          : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ I_THUNDER       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ I_TORNADO_FUNNEL: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ EA              : num  2.1 2.2 2.2 1.9 1.9 2.2 1.9 1.9 1.8 1.6 ...
##  $ ES              : num  2.9 3.4 3.1 2.6 2.7 3.1 3 2.9 3.1 3.2 ...
##  $ RH              : num  73.5 66.2 73.3 74.2 70.2 69.3 65.3 68.2 58.4 48.9 ...
##  - attr(*, ".internal.selfref")=<externalptr>

Plot Maximum and Minimum Temperature Values

Using the first data downloaded for a single station, 955510-99999, plot the temperature for 2010.

library("ggplot2")
library("tidyr")

# Create a dataframe of just the date and temperature values that we want to
# plot
tbar_temps <- tbar[, c("YEARMODA", "TEMP", "MAX", "MIN")]

# Gather the data from wide to long
tbar_temps <-
  pivot_longer(tbar_temps, cols = TEMP:MIN, names_to = "Measurement")

ggplot(data = tbar_temps, aes(x = YEARMODA,
                              y = value,
                              colour = Measurement)) +
  geom_line() +
  scale_color_brewer(type = "qual", na.value = "black") +
  scale_y_continuous(name = "Temperature") +
  scale_x_date(name = "Date") +
  ggtitle(label = "Max, min and mean temperatures for Toowoomba, Qld, AU",
          subtitle = "Data: U.S. NCEI GSOD") +
  theme_classic()
plot of chunk Ex5

plot of chunk Ex5

Using reformat_GSOD()

You may have already downloaded GSOD data or may just wish to use your browser to download the files from the server to you local disk and not use the capabilities of get_GSOD(). In that case the reformat_GSOD() function is useful.

There are two ways, you can either provide reformat_GSOD() with a list of specified station files or you can supply it with a directory containing all of the "STATION.csv" station files or "YEAR.zip" annual files that you wish to reformat.

Note Any .csv file provided to reformat_GSOD() will be imported, if it is not a GSOD data file, this will lead to an error. Make sure the directory and file lists are clean.

Reformat a List of Local Files

In this example two STATION.csv files are in subdirectories of user's home directory and are listed for reformatting as a string.

y <- c("~/GSOD/gsod_1960/20049099999.csv",
       "~/GSOD/gsod_1961/20049099999.csv")
x <- reformat_GSOD(file_list = y)

Reformat all Local Files Found in Directory

In this example all STATION.csv files in the sub-folder GSOD/gsod_1960 will be imported and reformatted.

x <- reformat_GSOD(dsn = "~/GSOD/gsod_1960")

Using update_station_list()

GSODR uses internal databases of station data from the NCEI to provide location and other metadata, e.g. elevation, station names, WMO codes, etc. to make the process of querying for weather data faster. This database is created and packaged with GSODR for distribution and is updated with new releases. Users have the option of updating these databases after installing GSODR. While this option gives the users the ability to keep the database up-to-date and gives GSODR's authors flexibility in maintaining it, this also means that reproducibility may be affected since the same version of GSODR may have different databases on different machines. If reproducibility is necessary, care should be taken to ensure that the version of the databases is the same across different machines.

The database file isd_history.rda can be located on your local system by using the following command, paste0(.libPaths(), "/GSODR/extdata")[1], unless you have specified another location for library installations and installed GSODR there, in which case it would still be in GSODR/extdata.

To update GSODR's internal database of station locations simply use update_station_list(), which will update the internal station database according to the latest data available from the NCEI.

update_station_list()

Using get_inventory()

GSODR provides a function, get_inventory() to retrieve an inventory of the number of weather observations by station-year-month for the beginning of record through to current.

Following is an example of how to retrieve the inventory and check a station in Toowoomba, Queensland, Australia, which was used in an earlier example.

inventory <- get_inventory()

inventory
##   *** FEDERAL CLIMATE COMPLEX INTEGRATED SURFACE DATA INVENTORY ***  
##    This inventory provides the number of weather observations by  
##    STATION-YEAR-MONTH for beginning of record through August 2023   
##                STNID                NAME    LAT    LON ELEV(M) CTRY STATE    BEGIN      END
##      1: 008415-99999                <NA>     NA     NA      NA <NA>  <NA>       NA       NA
##      2: 010010-99999 JAN MAYEN(NOR-NAVY) 70.933 -8.667       9   NO       19310101 20230817
##      3: 010010-99999 JAN MAYEN(NOR-NAVY) 70.933 -8.667       9   NO       19310101 20230817
##      4: 010010-99999 JAN MAYEN(NOR-NAVY) 70.933 -8.667       9   NO       19310101 20230817
##      5: 010010-99999 JAN MAYEN(NOR-NAVY) 70.933 -8.667       9   NO       19310101 20230817
##     ---                                                                                    
## 128547:   A51256-451                <NA>     NA     NA      NA <NA>  <NA>       NA       NA
## 128548:   A51256-451                <NA>     NA     NA      NA <NA>  <NA>       NA       NA
## 128549:   A51256-451                <NA>     NA     NA      NA <NA>  <NA>       NA       NA
## 128550:   A51256-451                <NA>     NA     NA      NA <NA>  <NA>       NA       NA
## 128551:   A51256-451                <NA>     NA     NA      NA <NA>  <NA>       NA       NA
##         COUNTRY_NAME ISO2C ISO3C YEAR  JAN  FEB  MAR  APR  MAY  JUN  JUL  AUG  SEP  OCT  NOV  DEC
##      1:         <NA>  <NA>  <NA> 2020    0    0   14    0    0    0    0    0    0    0    0    0
##      2:       NORWAY    NO   NOR 2020  736  695  744  717  744  718  743  742  718  694  708  740
##      3:       NORWAY    NO   NOR 2021  686  562  729  710  733  654  726  717  712  737  714  630
##      4:       NORWAY    NO   NOR 2022  549  513  292   98    0    0  137    0  292  709  708  724
##      5:       NORWAY    NO   NOR 2023  738  657  715  713  735  666  735  393    0    0    0    0
##     ---                                                                                          
## 128547:         <NA>  <NA>  <NA> 2019 2188 2000 2143 2105 2187 2124 2184 2138 2077 1872 1508 2159
## 128548:         <NA>  <NA>  <NA> 2020 2165 1455 2144 2125 2199 2123 2112 2192 2083 2079 2074 2187
## 128549:         <NA>  <NA>  <NA> 2021 2085 1992 2217 1975 2146 2092 2227 2170 2080 2163 2120 2168
## 128550:         <NA>  <NA>  <NA> 2022 2203 1937 2204 2144 2218 2119 2224 2209 2137 1743 2126 2201
## 128551:         <NA>  <NA>  <NA> 2023 2006 1988 2172 1993 2063 2088 2189 1200    0    0    0    0
subset(inventory, STNID %in% "955510-99999")
##   *** FEDERAL CLIMATE COMPLEX INTEGRATED SURFACE DATA INVENTORY ***  
##    This inventory provides the number of weather observations by  
##    STATION-YEAR-MONTH for beginning of record through August 2023   
##           STNID              NAME    LAT     LON ELEV(M) CTRY STATE    BEGIN      END COUNTRY_NAME
## 1: 955510-99999 TOOWOOMBA AIRPORT -27.55 151.917     642   AS       19980301 20230817    AUSTRALIA
## 2: 955510-99999 TOOWOOMBA AIRPORT -27.55 151.917     642   AS       19980301 20230817    AUSTRALIA
## 3: 955510-99999 TOOWOOMBA AIRPORT -27.55 151.917     642   AS       19980301 20230817    AUSTRALIA
## 4: 955510-99999 TOOWOOMBA AIRPORT -27.55 151.917     642   AS       19980301 20230817    AUSTRALIA
##    ISO2C ISO3C YEAR JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
## 1:    AU   AUS 2020 246 232 248 238 248 348 493 492 480 496 475 496
## 2:    AU   AUS 2021 485 483 742 720 743 716 744 737 719 744 720 726
## 3:    AU   AUS 2022 743 672 739 716 739 716 728 742 716 726 713 726
## 4:    AU   AUS 2023 738 663 730 715 737 701 733 399   0   0   0   0

Notes

WMO Resolution 40. NOAA Policy

The data summaries provided here are based on data exchanged under the World Meteorological Organization (WMO) World Weather Watch Program according to WMO Resolution 40 (Cg-XII). This allows WMO member countries to place restrictions on the use or re-export of their data for commercial purposes outside of the receiving country. Data for selected countries may, at times, not be available through this system. Those countries' data summaries and products which are available here are intended for free and unrestricted use in research, education, and other non-commercial activities. However, for non-U.S. locations' data, the data or any derived product shall not be provided to other users or be used for the re-export of commercial services.

Appendices

Appendix 1: GSODR Final Data Format, Contents and Units

GSODR formatted data include the following fields and units:

Appendix 2: Map of Current GSOD Station Locations

plot of chunk unnamed-chunk-1

plot of chunk unnamed-chunk-1

References



Try the GSODR package in your browser

Any scripts or data that you put into this service are public.

GSODR documentation built on Aug. 22, 2023, 9:10 a.m.