talks/2017-05-10_readxl-rstudio-webinar-notes.md

readxl Webinar

Notes made in preparation for a readxl RStudio webinar given on 2017-05-10. A recording can be found at that link. Slides can be found on SpeakerDeck. Slides and more can be found in the RStudio Webinar GitHub repo or in readxl's GitHub repo.

Description

Like it or not, spreadsheets are a common data source for many of us. We’ll review the overall landscape for importing spreadsheet data into R and then go into detail for the readxl package specifically. readxl is the Tidyverse solution for reading data stored in the legacy xls format or the more modern xlsx format. It has no tricky external dependencies, is quite speedy, and is easy to install and use across all operating systems.

Context: Tidyverse

http://tidyverse.org

readxl is the Excel-reading package with an interface that is most consistent with, e.g., readr, haven, etc. in terms of data import. Why does this matter?

Key concept: once you get data into R, it's stored as a tibble, which is a special flavor of data frame.

Context: R packages for Excel

There are many R packages that can read Excel spreadsheets besides readxl, such as:

Why would you pick readxl?

Excel file formats

.xls = legacy Excel format, Excel '97(-2007)

.xlsx = modern Excel format

Pick a package that reads the format(s) you have.

Aggravating external dependencies

All Excel-reading R packages are relying on external libraries. The only question is whether the user will feel that or not.

readxl benefits from

readxl fully embeds these libraries. On Mac, Windows, and Linux, you should be able to just install readxl and go.

The alternative: get the user to install external dependencies.

readxl: reading rectangles

readxl targets rectangular data in spreadsheets.

Which rectangle?

Draw on these resources:

RStudio IDE support

Access the RStudio helper in one of two ways:

What's cool about it?

Bear in mind there are more arguments to read_excel() and more flexible ways to use these arguments if you write the code yourself.

Column types

Draw on these resources:

Workflows and iteration

Draw on http://readxl.tidyverse.org/articles/articles/readxl-workflows.html



hadley/readxl documentation built on Oct. 15, 2023, 10:28 a.m.