extract_date: Extract and parse dates within strings such as file names

Description Usage Arguments Value Examples

View source: R/extract_date.R

Description

Extract and parse dates within strings such as filenames. Currently handles formats "yyyy?mm?dd" and "dd?mm?yyyy", where "?" represents an optional space or punctuation character.

Includes options for dealing with multiple dates within a string (return either the first or last), and handling dates that can be parsed in both ymd and dmy formats (return ymd, dmy, or whichever ends up being closer to the present day).

Usage

1
2
3
4
5
6
7
8
extract_date(
  x,
  remove_path = TRUE,
  year_min = 2000,
  year_max = as.integer(format(Sys.Date(), "%Y")),
  if_multiple = "use_last",
  if_conflict = "use_latest"
)

Arguments

x

Vector of strings such as file names

remove_path

Logical indicating whether to strip file paths from x using basename (up to and including the last path separator) prior to searching for date values. Defaults to TRUE.

year_min

Minimum year value expected. Targeting an expected year range helps avoid false positive matches to number sequences that are not dates. Defaults to 2000.

year_max

Maximum year value expected. Defaults to the current year based on Sys.date.

if_multiple

If multiple dates of a given type (ymd or dmy) found within a string, proceed with only the first ("use_first"), or only the last ("use_last"). Defaults to "use_last".

if_conflict

If a given string matches both ymd and dmy formats (e.g. "2020_06_10_2020"), return the ymd format ("use_ymd"), the dmy format ("use_dmy"), or whichever of the two is closer to the current date ("use_latest"). Defaults to "use_latest".

Value

Vector of dates. Returns NA when no date found within a given string.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
files <- c(
  "~/Documents/2020-05-01/myfile-2020-04-30-1230.csv",
  "2020_06_30_052051_Database_Complete_v1230.csv",
  "2020_06_10_2020_Database_Complete_v1230.csv",
  "~/Desktop/data__cleaned1.xlsx",
  "~/Desktop/data__cleaned__2020-07-01_16-25.xlsx",
  "22062020-covid19-cases.xlsm",
  "/Documents/2015/PhD.Data.20091205_1247.Final.xls",
  "/exports/Cleaning 2016-03-05/export-2015-03-05_1352.xls",
  "COVID19_28072020.xlsb"
)

extract_date(files)

# prefer date within first portion of file path, if matched
extract_date(files, remove_path = FALSE, if_multiple = "use_first")

epicentre-msf/llutils documentation built on Nov. 9, 2020, 8:24 p.m.