knitr::opts_chunk$set(echo = TRUE, rows.print = 3) library("learnr") library("tidyverse") library("lubridate") library("conflicted") conflict_prefer(name = "filter", winner = "dplyr") tutorial_options(exercise.cap = "Exercise") temperature <- read_csv(system.file("extdata/Florida_2020-04-20_2020-04-23_1589018968.csv", package = "data.handling")) temperature2 <- temperature %>% mutate( date_time = paste(Dato, Tid), #combine date_time = ymd_hms(date_time)#convert )
In this tutorial, you will learn how to
dplyr
ggplot2
Converting strings to dates is tricky.
When entering data into a spreadsheet or similar, I strongly recommend using the international standard format for dates and times.
It is usually more convenient to have dates in a Date
format and date/times in a POSIXct
format rather than as a character.
This allows dates to be manipulated, sorted, and plotted properly.
This can be done using base R.
dat <- "26 October 2016 14:39:10 CEST" #date with time-zone as.POSIXct(dat, format = "%d %B %Y %H:%M:%S") # might return NA if run on your computer - see locales below
Here the %d
represents the day, %B
represents the month written as the full word (%b
would work if the month was the abbreviated) and %Y
represents the 4-digit year.
There is a complete alphabet of codes you can use.
See ?strptime
for the complete list.
Use the as.Date
(for the date) and as.POSIXct
(for the datetime) to convert the following to dates or datetimes.
You will need to use the format
argument to each function and read the strptime
help.
If you get an NA
, the format arguement is not correct.
date1 <- "28/02/1999" date2 <- "July 1 2001 2:14"
date1 <- "28/02/1999" as.Date(date1, format = "%d/%m/%Y") date2 <- "July 1 2001 2:14" as.POSIXct(date2, format = "%B %d %Y %H:%M")
lubridate
You probably found that last exercise difficult.
Fortunately, the lubridate
package makes converting dates much easier.
"%d %B %y"
With lubridate
, you don't need to remember the alphabet of codes and get the format exactly right, you just need to select the function with the day, month and year in the correct order.
library("lubridate") dat <- "26 October 2016 14:39:10" dmy_hms(dat)
dmy_hms
expects the order to be days-months-years followed by hours:minutes:seconds.
Provided the elements are in this order, any format can be used.
dmy_hms("26-10-16 14.39.10") dmy_hms("26th Oct 2016 14 39 10") dmy_hms("261016143910")
lubridate
can even cope when the format is mixed, provided the elements are in the same order.
dmy_hms(c("26-10-16 14.39.10", "26th Oct 2016 14 39 10", "261016143910"))
There are many other functions in the lubridate
package for coping with dates with elements in different orders.
For example
dmy_hm("26th Oct 2016 14.39") # no seconds ydm_hms("2016 26th Oct 14 39 10") # year first then day mdy_hms("Oct 26th 2016 14 39 10") # month first (US standard)
Look at the help file for more functions and arguments.
If you have an invalid format or an impossible date, you will get a warning.
dmy("28-F-2020") # Invalid format - use February, Feb or 2 dmy("31-Feb-2020") # Impossible date
Use the lubridate
package to convert the following to date or datetime format.
date1 <- "28/02/1999" date2 <- "July 1 2001 2:14"
date1 <- "28/02/1999" dmy(date1) date2 <- "July 1 2001 2:14" mdy_hm(date2)
The default timezone for the lubridate
functions is UTC (approximately Greenwich Mean Time).
This is normally fine unless you are dealing with local times in multiple time-zones, or because you need to allow for daylight saving time.
mdy_hm("July 1 2001 2:14", tz = "Europe/Oslo")
The function OlsonNames()
returns a vector of valid time zones.
See the help file for details.
OlsonNames()[1:10]
Dates and times can be used in calculations
ymd("2020-5-18") - 5 #base unit is days for dates ymd("2020-5-18") - period(5, unit = "months") ymd("2020-5-18") - ymd("2020-5-13")
Find the number of days until Christmas.
#How many days until Christmas 2020
ymd("2020-12-25") - today()
Given a date or time, we can extract different elements, for example the month, day or hour.
month(today()) yday(today())# day of year
Find the current hour.
#Find the current datetime with Sys.time() or now()
hour(now())
R uses locales to know what language to expect the date to be written in.
See ?locales
.
Locales are a bit of a pain.
You can find your current locale with
Sys.getlocale(category = "LC_TIME")
Unfortunately, the locales available depend on your operating system.
You can change the locale to Bokmål for the whole session with
Sys.setlocale(category = "LC_TIME", locale = "nb_NO.utf8") #on linux/Mac, "nn_NO.utf8" for nynorsk #Sys.setlocale(category = "LC_TIME", locale = "Norwegian Bokmål_Norway.1252") # should work on windows dmy("1 januar 2020")
Or use the locale
argument of dmy
.
To see the available locales on your computer, run
system("locale -a", intern = TRUE) #on linux/Mac
On Windows, you need to go to the Region setting in the Control Panel.
Sys.setlocale("LC_TIME", "en_DK.utf8")
dplyr
As an ecologist, I spend a lot of time processing climate data for comparison with ecological data.
The data are typically in R in a data.frame, so dplyr
is useful for manipulating them.
This section shows how to use lubridate
together with dplyr
.
The dataset temperature
includes a few days of air temperature from the GFI weather station, Bergen.
The dataset has three columns, Dato
, Tid
and Lufttemperatur
.
I have imported the data with readr::read_csv
so the first two columns have automatically been converted to date and time formats.
temperature
mutate
to make a new columnI want to make a single date-time column.
To do this I need to combine the data and time columns with paste
, and then use a lubridate
function for the conversion.
temperature2 <- temperature %>% mutate( date_time = paste(Dato, Tid), #combine date_time = ymd_hms(date_time)#convert ) temperature2
lubridate
functions make_date
and make_datetime
are useful if the year, month, day etc are in different columns.
tibble(year = 2020, month = 10, day = 3) %>% mutate(date = make_date(year = year, month = month, day = day))
filter
to select rowsI often need to filter
data to remove bad data or restrict the dates and times in the dataset.
For example, to calculate the mean temperature of the afternoon of the 21st April, we first need to filter
the data.
between
is a useful helper function here to avoid having to write a more complicated logical condition.
temperature2 %>% filter(between(date_time, left = ymd_hm("2020-04-21 12:00"), right = ymd_hm("2020-04-21 17:00"))) %>% summarise(mean_temp = mean(Lufttemperatur))
Filter temperature2
so that it just includes the mornings.
temperature2 %>%
temperature2 %>% filter(hour(date_time) < 12)
group_by
and summarise
I often want to summarise climate data to calculate mean monthly temperatures.
We can do an analogous task with the temperature
dataset to calculate mean hourly temperatures.
temperature2 %>% mutate(hour = hour(date_time)) %>% group_by(hour) %>% summarise(mean_temperature = mean(Lufttemperatur))
Find the mean temperature of each morning.
temperature2 %>%
temperature2 %>% filter(hour(date_time) < 12) %>% group_by(Dato) %>% summarise(morning_mean = mean(Lufttemperatur))
We can plot the temperature data with ggplot
.
ggplot(temperature2, aes(x = date_time, y = Lufttemperatur)) + geom_line()
ggplot
recognises that the x-axis variable is a date-time and plots the axis with appropriate labels.
The language will depend on the locale (see Sys.getlocale()
).
I don't like the way the dates have been formatted so I can use scale_x_datetime
to alter the format.
The codes from ?strptime
that you learnt at the start of this tutorial finally become useful!
ggplot(temperature2, aes(x = date_time, y = Lufttemperatur)) + geom_line() + scale_x_datetime(date_labels = "%A %d %B") #see also date_breaks to change label positions
We can also plot times.
ggplot(temperature2, aes(x = Tid, y = Lufttemperatur, group = Dato)) + geom_line()
Here the aesthetic group = Dato
make ggplot
draw a different line for each day.
I don't think the breaks in this plot were chosen very well so I can use scale_x_time
to set the breaks and the format for the times.
ggplot(temperature2, aes(x = Tid, y = Lufttemperatur, group = Dato)) + geom_line() + scale_x_time(breaks = seq(0, 24, 6) * 3600, labels = scales::time_format("%H:%M"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.