knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
# Set up a temp directory and set it as the root directory for all chunks temp_dir <- tempdir() knitr::opts_knit$set(root.dir = temp_dir)
The basepenguins package provides tools to convert R scripts and R Markdown/Quarto documents (or other specified file types) that use the palmerpenguins package to use the versions of penguins
and penguins_raw
from datasets (R ≥ 4.5.0).
With R ≥ 4.5.0, the popular Palmer Penguins datasets are now directly available without loading the palmerpenguins package. This makes them more accessible, especially for new R users and for teaching purposes. However, there are some differences between the variable names in the palmerpenguins package and those in R's datasets package:
| palmerpenguins | datasets | |-------------------|--------------| | bill_length_mm | bill_len | | bill_depth_mm | bill_dep | | flipper_length_mm | flipper_len | | body_mass_g | body_mass |
These shorter variable names in the base R version were chosen for more compact code and data display. It does mean, however, that for those wanting to use R’s version of penguins
, it isn’t simply a case of removing the call to library(palmerpenguins)
or replacing palmerpenguins
with datasets
in data("penguins", package = "palmerpenguins")
and the script still running.
The basepenguins package takes care of converting files by removing the call to palmerpenguins and making the necessary conversions to variable names, ensuring that the resulting scripts still run using the datasets (R ≥ 4.5.0) versions of penguins
and penguins_raw
.
library(basepenguins)
The basepenguins package provides four functions to convert files:
convert_files()
: Convert specified files to new output locationsconvert_files_inplace()
: Convert files in-placeconvert_dir()
: Convert files in a specified directory and its subdirectories to a new output directory, preserving nesting structureconvert_dir_inplace()
: Convert files in a directory in-placeIf using convert_files_inplace()
or convert_dir_inplace()
,
we recommend doing so in conjunction with a version-control system such as git,
so that any changes can be easily checked.
Additionally, there are helper functions:
example_files()
and example_dir()
: Access example files included in the packageoutput_paths()
: Generate modified file pathsfiles_to_convert()
: List files in a directory with specified extensionsWhen a file is 'convertible', i.e. contains a call to library(palmerpenguins)
or data("penguins", package = "palmerpenguins")
and has one of the specified extensions (by default "R"
, "r"
, "qmd"
, "rmd"
, "Rmd"
), the conversion makes these changes:
library(palmerpenguins)
(or same with palmerpenguins
in quotes) with the empty string""
data("penguins", package = "palmerpenguins")
(with any style of quotes) with data("penguins", package = "datasets")
bill_length_mm
→ bill_len
bill_depth_mm
→ bill_dep
flipper_length_mm
→ flipper_len
body_mass_g
→ body_mass
ends_with("_mm")
with starts_with("flipper_"), starts_with("bill_")
The package includes an example directory with four example files to demonstrate how the conversion works.
These are accessible through example_files()
and example_dir()
.
# List all example files example_files()
These example files include:
penguins.R
: An R script using the palmerpenguins packageno_penguins.Rmd
: An Rmarkdown file that includes ends_with("_mm")
but not in the context of the palmerpenguins packagenested/penguins.qmd
: A Quarto document using the palmerpenguins packagenested/not_a_script.md
: Contains library(palmerpenguins)
, but is not a script type that is converted by defaultYou can examine the content of any of these files, e.g.:
penguins_script <- example_files("penguins.R") cat(readLines(penguins_script), sep = "\n")
The example_dir()
function returns the path to the directory containing all example files.
It also has a copy.dir
argument that allows you to copy all the example files to a new directory.
This is especially useful for testing the conversion functions that modify files in-place
without affecting the original example files distributed with the package:
# Copy all example files to a new subdirectory of the working directory example_dir("examples") # List the files in the copied directory list.files("examples", recursive = TRUE)
Note that for the purposes of this vignette (and to adhere to CRAN policies),
the working directory has been set to a tempdir
and all new directories and files
are written there, using relative paths.
The package offers two main approaches to converting files: creating new converted versions with convert_files()
or modifying files in place withconvert_files_inplace()
.
Let's start by converting a single file to see how it works:
# Convert a single file to a new output file convert_files(penguins_script, "converted_penguins.R")
# Look at the converted file cat(readLines("converted_penguins.R"), sep = "\n")
Notice how the function has:
library(palmerpenguins)
lineends_with("_mm")
to use starts_with()
patterns insteadBoth the input
and output
parameters of convert_files()
take a vector of file paths,
allowing you to convert multiple files at once.
If you want to overwrite the original files rather than creating new ones, you can use convert_files_inplace()
,
which works exactly the same as convert_files()
, except that it doesn't take an output
argument - it is simply a convenience wrapper around convert_files(input, input, extensions)
.
All the convert_*()
functions invisibly return a list with two components:
changed
: Files that were modifiednot_changed
: Files that were not modified (either they don't have the specified extensions or they don't use the palmerpenguins package)If the output
paths are different than the input
paths,
the values in the changed
and not_changed
vectors will be subsets of output
,
and they will be named with the corresponding input
paths.
If files are overwritten, then the values in changed
and not_changed
will be subsets of input
and the vectors will not be named.
This list is returned invisibly for two reasons:
The convert_*()
functions generate messages in the following circumstances:
ends_with("_mm")
substitutions are made,
a message with the output file paths and line numbers of those changesTo convert all convertible files in a directory (and its subdirectories), use convert_dir()
.
We'll use the "examples"
directory that we created above with the call to example_dir("examples")
.
result <- convert_dir("examples", "converted_examples") result
To convert all files in a directory in place, use convert_dir_inplace()
.
A useful call is convert_dir_inplace(".")
to overwrite all convertible files in the working directory,
though we don't run that here, demonstrating on a fresh copy of the example directory instead.
example_dir("in_place_dir") result <- convert_dir_inplace("in_place_dir") result
When working with large directories, the files_to_convert()
function helps you find files with specific extensions that might be candidates for conversion:
# List all files with convertible extensions in a directory potential_files <- files_to_convert("examples") potential_files
It's important to note that files_to_convert()
only filters files by their extensions and does not look for palmerpenguins
in their content.
By default, this function looks for files with extensions "R"
, "r"
, "qmd"
, "rmd"
, or "Rmd"
. You can specify different extensions if needed, or return absolute file paths. See files_to_convert()
for further details:
# Only look for R scripts files_to_convert("examples", extensions = "R")
# All extensions files_to_convert("examples", extensions = NULL)
When converting files to new locations, the output_paths()
function helps generate appropriate output paths, based on the input paths (which are preserved as names). These can then be passed to the output
argument in convert_files()
.
By default, output_paths()
adds a "_new"
suffix to the file name, but other suffixes, or prefixes, can be specified. Other output directories can also be given:
input_files <- files_to_convert("examples") # Default output_paths(input_files)
# Generate output paths with prefix instead, in new directory output_paths(input_files, prefix = "base_", suffix = "", dir = "~/output")
ends_with("_mm")
substitutionThe palmerpenguins Get started vignette
has examples of using ends_with("_mm")
within calls to dplyr::select()
, as a convenient way to select the
flipper_length_mm
, bill_length_mm
and bill_depth_mm
columns.
This pattern presents a design challenge for basepenguins.
We need a way to select the flipper_len
, bill_len
and bill_dep
columns.
The most obvious substition for ends_with("_mm")
is therefore flipper_len, starts_with("bill_")
,
which preserves the use of a tidyselect function.
However, suppose we have a previous call to dplyr::select()
, and have converted the file with the above.
Then following code will generate an error, because flipper_len
is no longer available to be selected:
penguins |> select(bill_len, bill_dep) |> select(flipper_len, starts_with("bill_"))
Although the above example is contrived, we don't want to break anyone's code,
so instead we replace ends_with("_mm")
with:
starts_with("flipper_"), starts_with("bill_")
This won't error, even if there are no column names starting with "flipper_"
or "bill_"
.
However, we shouldn't ever really need starts_with("flipper_")
as there is only one column in penguins
that meets that criteria,
so we suggest manually checking this substitution and either replacing starts_with("flipper_")
with flipper_len
if flipper_len
is still a column in the data frame, or removing starts_with("flipper_")
entirely if not.
To facilitate this, the convert_*()
functions all print a message indicating where these substitutions were made,
to help you manually review and potentially refine these changes if desired.
The use of the ends_with("_mm")
pattern with the penguins
dataset is also the reason why we only convert files if library(palmerpenguins)
or data("penguins", package = "palmerpenguins")
is found in the file.
It is possible to imagine different data frames for which this selector could be used,
and we don't want to inadvertently alter those. We provide an example file to demonstrate this:
no_penguins_file <- "examples/no_penguins.Rmd" cat(readLines(no_penguins_file), sep = "\n")
# Pass it to a convert function convert_files(no_penguins_file, "no_penguins_converted.Rmd") # The content doesn't change cat(readLines("no_penguins_converted.Rmd"), sep = "\n")
Even though this file contains ends_with("_mm")
, and is an R Markdown file,
it doesn't use the palmerpenguins package, so no substitutions are made.
Notice also that there were no messages generated when convert_files()
was called,
indicating that none of the input files changed.
The versions of penguins
and penguins_raw
in R ≥ 4.5.0's datasets package will always (just) have class data.frame
.
In contrast, the palmerpenguins versions will have classes tbl_df
, tbl
and data.frame
if the tibble package is installed on your computer (and just class data.frame
if not).
penguins_raw
The versions of penguins_raw
in palmerpenguins and datasets are identical, except potentially for their class, as described above.
No specific changes are made to penguins_raw
by the convert_*()
functions in basepenguins,
but by removing the call to library(palmerpenguins)
, the datasets version will be used in any scripts,
which is always a data.frame
(never a tbl_df
).
Note that the palmerpenguins package provides features that are not in R, such as vignettes and articles on the package website. The package also contains the data in two csv files and provides a function to access them. And, of course, Allison Horst's wonderful penguins artwork! The palmerpenguins package will remain on CRAN and keep its package website.
We are extremely grateful to the authors of palmerpenguins, Allison Horst, Alison Hill and Kristen Gorman, for their support for adding the Palmer Penguins data to datasets, and their enthusiasm about basepenguins.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.