| fix_date_df | R Documentation |
Tidies a dataframe or tibble object with date
columns entered via a free-text interface, addressing non-standardized
formats. Supports diverse separators including /, -, ., and spaces. Handles
all-numeric, abbreviated, or full-length month names in languages such as
English, French, German, Spanish, Portuguese, Russian, Czech, Slovak, and
Indonesian. Imputes missing day data by default, with flexibility for custom
imputation strategies.
fix_date_df(
df,
col.names,
day.impute = 1,
month.impute = 7,
id = NULL,
format = "dmy",
excel = FALSE,
roman.numeral = FALSE,
cores = getOption("Ncpus", 1)
)
This function processes messy date data by:
Supporting mixed format data entries
Recognizing multilingual month names and Roman numeral inputs
Interpreting Excel-style serial date numbers if specified
Providing warnings and controls for missing day/month imputation
For further details and advanced usage, refer to the vignette via
browseVignettes("datefixR") or visit the online documentation at
https://docs.ropensci.org/datefixR/.
A revised dataframe or tibble structure, maintaining
input type. Date columns will be formatted with Date class and
display as yyyy-mm-dd.
fix_date_char for similar functionality on character vectors.
For comprehensive examples and usage practices, consult:
Vignette: browseVignettes("datefixR")
Documentation: https://docs.ropensci.org/datefixR/articles/datefixR.html
README Overview: https://docs.ropensci.org/datefixR/
# Basic cleanup
data(exampledates)
fix_date_df(exampledates, c("some.dates", "some.more.dates"))
# Usage with metadata
messy_dates_df <- data.frame(
id = seq(1, 3),
dates = c("1992", "April 1990", "Mar 19")
)
fix_date_df(messy_dates_df, "dates", day.impute = 15, month.impute = 12)
# Diverse format normalization
df_formats <- data.frame(
mixed.dates = c("02/05/92", "2020-may-01", "1996.05.01", "October 2022"),
european.dates = c("22.07.1977", "05.06.2023")
)
fix_date_df(df_formats, c("mixed.dates", "european.dates"))
# Excel serial examples
serial_df <- data.frame(serial.dates = c("44197", "44927"))
fix_date_df(serial_df, "serial.dates", excel = TRUE)
# Handling Roman numerals
roman_df <- data.frame(roman.dates = c("15.I.2023", "03.XII.2019"))
fix_date_df(roman_df, "roman.dates", roman.numeral = TRUE)
# Parallel processing (requires 'future' and 'future.apply' packages)
## Not run:
large_df <- data.frame(
dates1 = c("01/02/2020", "15/03/2021", "22/12/2019"),
dates2 = c("2020-01-01", "March 2021", "Dec 2019"),
dates3 = c("01.01.20", "15.03.21", "22.12.19")
)
# Use 4 cores for parallel processing
fix_date_df(large_df, c("dates1", "dates2", "dates3"), cores = 4)
# Use all available cores (respects getOption("Ncpus"))
options(Ncpus = parallel::detectCores())
fix_date_df(large_df, c("dates1", "dates2", "dates3"))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.