knitr::opts_chunk$set(echo = TRUE, rows.print = 3)
library("learnr")
library("tidyverse")
library("lubridate")
library("conflicted")
conflict_prefer(name = "filter", winner = "dplyr")

tutorial_options(exercise.cap = "Exercise")

temperature <- read_csv(system.file("extdata/Florida_2020-04-20_2020-04-23_1589018968.csv", package = "data.handling"))

temperature2 <- temperature %>% 
  mutate(
    date_time = paste(Dato, Tid), #combine
    date_time = ymd_hms(date_time)#convert
    ) 

Date and times

In this tutorial, you will learn how to

Date and time formats

Converting strings to dates is tricky.

Many formats

Possibly languages

When entering data into a spreadsheet or similar, I strongly recommend using the international standard format for dates and times.

Converting dates and times

It is usually more convenient to have dates in a Date format and date/times in a POSIXct format rather than as a character. This allows dates to be manipulated, sorted, and plotted properly.

This can be done using base R.

dat <- "26 October 2016 14:39:10 CEST" #date with time-zone
as.POSIXct(dat, format = "%d %B %Y %H:%M:%S") # might return NA if run on your computer - see locales below

Here the %d represents the day, %B represents the month written as the full word (%b would work if the month was the abbreviated) and %Y represents the 4-digit year. There is a complete alphabet of codes you can use. See ?strptime for the complete list.

Your turn

Use the as.Date (for the date) and as.POSIXct (for the datetime) to convert the following to dates or datetimes. You will need to use the format argument to each function and read the strptime help. If you get an NA, the format arguement is not correct.

date1 <- "28/02/1999"

date2 <- "July 1 2001 2:14"
date1 <- "28/02/1999"
as.Date(date1, format = "%d/%m/%Y")

date2 <- "July 1 2001 2:14"
as.POSIXct(date2, format = "%B %d %Y %H:%M")

Converting dates and times with lubridate

You probably found that last exercise difficult. Fortunately, the lubridate package makes converting dates much easier.

No more "%d %B %y"

With lubridate, you don't need to remember the alphabet of codes and get the format exactly right, you just need to select the function with the day, month and year in the correct order.

library("lubridate")
dat <- "26 October 2016 14:39:10"
dmy_hms(dat)

dmy_hms expects the order to be days-months-years followed by hours:minutes:seconds. Provided the elements are in this order, any format can be used.

dmy_hms("26-10-16 14.39.10")
dmy_hms("26th Oct 2016 14 39 10")
dmy_hms("261016143910")

lubridate can even cope when the format is mixed, provided the elements are in the same order.

dmy_hms(c("26-10-16 14.39.10", "26th Oct 2016 14 39 10", "261016143910"))

Different formats

There are many other functions in the lubridate package for coping with dates with elements in different orders.

For example

dmy_hm("26th Oct 2016 14.39") # no seconds
ydm_hms("2016 26th Oct 14 39 10") # year first then day
mdy_hms("Oct 26th 2016 14 39 10") # month first (US standard)

Look at the help file for more functions and arguments.

If you have an invalid format or an impossible date, you will get a warning.

dmy("28-F-2020") # Invalid format - use February, Feb or 2
dmy("31-Feb-2020") # Impossible date

Your turn

Use the lubridate package to convert the following to date or datetime format.

date1 <- "28/02/1999"

date2 <- "July 1 2001 2:14"
date1 <- "28/02/1999"
dmy(date1)

date2 <- "July 1 2001 2:14"
mdy_hm(date2)

Time zones

The default timezone for the lubridate functions is UTC (approximately Greenwich Mean Time). This is normally fine unless you are dealing with local times in multiple time-zones, or because you need to allow for daylight saving time.

mdy_hm("July 1 2001 2:14", tz = "Europe/Oslo")

The function OlsonNames() returns a vector of valid time zones. See the help file for details.

OlsonNames()[1:10]

Date arithmetic

Dates and times can be used in calculations

ymd("2020-5-18") - 5 #base unit is days for dates
ymd("2020-5-18") - period(5, unit = "months")
ymd("2020-5-18") - ymd("2020-5-13")

Your turn

Find the number of days until Christmas.

#How many days until Christmas 2020
ymd("2020-12-25") - today()

Extracting elements of the date

Given a date or time, we can extract different elements, for example the month, day or hour.

month(today())
yday(today())# day of year

Your turn

Find the current hour.

#Find the current datetime with Sys.time() or now()
hour(now())

Languages

R uses locales to know what language to expect the date to be written in. See ?locales. Locales are a bit of a pain.

You can find your current locale with

Sys.getlocale(category = "LC_TIME")

Unfortunately, the locales available depend on your operating system.

You can change the locale to Bokmål for the whole session with

Sys.setlocale(category = "LC_TIME", locale = "nb_NO.utf8") #on linux/Mac, "nn_NO.utf8" for nynorsk
#Sys.setlocale(category = "LC_TIME", locale = "Norwegian Bokmål_Norway.1252") # should work on windows
dmy("1 januar 2020")

Or use the locale argument of dmy.

To see the available locales on your computer, run

system("locale -a", intern = TRUE) #on linux/Mac

On Windows, you need to go to the Region setting in the Control Panel.

Sys.setlocale("LC_TIME", "en_DK.utf8")

Dates and dplyr

As an ecologist, I spend a lot of time processing climate data for comparison with ecological data. The data are typically in R in a data.frame, so dplyr is useful for manipulating them. This section shows how to use lubridate together with dplyr.

The dataset temperature includes a few days of air temperature from the GFI weather station, Bergen.

The dataset has three columns, Dato, Tid and Lufttemperatur. I have imported the data with readr::read_csv so the first two columns have automatically been converted to date and time formats.

temperature

Using mutate to make a new column

I want to make a single date-time column. To do this I need to combine the data and time columns with paste, and then use a lubridate function for the conversion.

temperature2 <- temperature %>% 
  mutate(
    date_time = paste(Dato, Tid), #combine
    date_time = ymd_hms(date_time)#convert
    ) 

temperature2

lubridate functions make_date and make_datetime are useful if the year, month, day etc are in different columns.

tibble(year = 2020, month = 10, day = 3) %>% 
  mutate(date = make_date(year = year, month = month, day = day))

Using filter to select rows

I often need to filter data to remove bad data or restrict the dates and times in the dataset.

For example, to calculate the mean temperature of the afternoon of the 21st April, we first need to filter the data. between is a useful helper function here to avoid having to write a more complicated logical condition.

temperature2 %>% 
  filter(between(date_time, 
                 left = ymd_hm("2020-04-21 12:00"),
                 right = ymd_hm("2020-04-21 17:00"))) %>% 
  summarise(mean_temp = mean(Lufttemperatur))

Your turn

Filter temperature2 so that it just includes the mornings.

temperature2 %>% 
temperature2 %>% 
  filter(hour(date_time) < 12)

Using group_by and summarise

I often want to summarise climate data to calculate mean monthly temperatures. We can do an analogous task with the temperature dataset to calculate mean hourly temperatures.

temperature2 %>% 
  mutate(hour = hour(date_time)) %>% 
  group_by(hour) %>% 
  summarise(mean_temperature = mean(Lufttemperatur))

Your turn

Find the mean temperature of each morning.

temperature2 %>% 
temperature2 %>% 
  filter(hour(date_time) < 12) %>% 
  group_by(Dato) %>%
  summarise(morning_mean = mean(Lufttemperatur))

Plotting dates

We can plot the temperature data with ggplot.

ggplot(temperature2, aes(x = date_time, y = Lufttemperatur)) + 
  geom_line()

ggplot recognises that the x-axis variable is a date-time and plots the axis with appropriate labels. The language will depend on the locale (see Sys.getlocale()). I don't like the way the dates have been formatted so I can use scale_x_datetime to alter the format. The codes from ?strptime that you learnt at the start of this tutorial finally become useful!

ggplot(temperature2, aes(x = date_time, y = Lufttemperatur)) + 
  geom_line() +
  scale_x_datetime(date_labels = "%A %d %B") #see also date_breaks to change label positions

We can also plot times.

ggplot(temperature2, aes(x = Tid, y = Lufttemperatur, group = Dato)) + 
  geom_line()

Here the aesthetic group = Dato make ggplot draw a different line for each day.

I don't think the breaks in this plot were chosen very well so I can use scale_x_time to set the breaks and the format for the times.

ggplot(temperature2, aes(x = Tid, y = Lufttemperatur, group = Dato)) + 
  geom_line() + 
  scale_x_time(breaks = seq(0, 24, 6) * 3600, labels = scales::time_format("%H:%M"))


Bio302-UiB/data-handling documentation built on Dec. 6, 2020, 12:15 p.m.