knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
You can use this space to explain what this report is about. What were your objectives, what you found, and how you did it.
Remember that you can change the project code and type, psychologies used, and testing design in the header. They will automatically show up here:
r rmarkdown::metadata$project_code
r stringr::str_to_upper(rmarkdown::metadata$evaluation)
strategy for testing.r stringr::str_to_upper(rmarkdown::metadata$project_type)
project. r rmarkdown::metadata$psychologies
Metadata is useful to the project team but also great for book-keeping. It collects valuable information on our practices that can be used to better inform our future work. Fill it out carefully, and remember to update it whenever it changes!
Typically, you will receive a dataset that is most likely in an Excel (.xlsx
or .xls
) or .csv
(comma-separated values) format, which you will have to load and clean before analyzing it. In this template we use the penguins
sample dataset available in the palmerpenguins
package, so we don't actually read in any data. Just remember that you can easily do that with the read_csv
(for csv files), read_excel
, or read_dta
(for Stata files) functions. If you want to learn more about reading in data, a tutorial is available on our knowledge-sharing platform.
First, we load all the packages we'll need:
library(palmerpenguins) library(visdat) library(skimr) library(tidyverse) library(tools42) # Using the penguins dataset data(penguins)
A good way to start is by looking at what variables are available in our dataset, as well as the number of observations. The skim
function in the skimr
package is a quick and easy way to summarize this. Consider it a beefed-up replacement of the Stata summarize
command. You'll see you also get tiny histograms for your continuous data. Visualizing your data is critical and you should always, always spend some time plotting it. You won't understand your data unless you plot it.
skim(penguins)
Great. Looks like most of our data is complete and falls under the expected ranges (that is, there are no crazy values for any of the variables). Let's get a sense of which variables are continuous and which are categorical (character or factor).
We can do that with the vis_dat
function, which plots every observation and color-codes it by variable type. We could even make it look nicer by adding the i42 theme and color scheme, since this is a ggplot
object.
vis_dat(penguins_raw)
Isn't this cool? There's also a function to see the degree of missingness for each of our variables. It's called vis_miss
:
vis_miss(penguins_raw)
Ok, let's look at the relationships between variables.
This is interesting. I wonder if there's any differences by species!
penguins %>% viz_scatter(flipper_length_mm, bill_length_mm) + aes(color = species) + theme_42() + scale_color_manual(values = palette_42("i42_bright"))
Now, a histogram of flipper length:
penguins %>% viz_hist(flipper_length_mm) + aes(fill = species) + facet_wrap(~species) + theme_42() + scale_fill_manual(values = palette_42("i42_bright"))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.