library(learnr)
library(testwhat)

tutorial_options(
  exercise.timelimit = 60,
  exercise.checker = testwhat::testwhat_learnr
)
knitr::opts_chunk$set(comment = NA)

Required packages

| Package | Install Command | |-----------------------------------------------------------|------------------------------------------------| | rmarkdown | install.packages("rmarkdown") | | remotes | install.packages("remotes") | | learnr | install.packages("learnr") | | teachr | remotes::install_github("astamm/teachr") | | testwhat | remotes::install_github("datacamp/testwhat") | | tidyverse | install.packages("tidyverse") |

Material for this course {data-progressive="TRUE"}

Softwares

In order to get the best experience of this course, you are kindly asked to bring your own personal laptop with the latest versions of R and RStudio Preview installed and ready for use. You will also need a decent internet connection.

R (and RStudio) can be customized using external code bundled into so-called packages (we will come back to that in a few minutes).

This course is a gentle introduction to R. If you want to go deeper, you can take a look at the RStudio Education Website.

Material format

Overview

The material that we provide for this course is called a tutorial. A tutorial comes as a folder containing at least one file with extension .Rmd. In addition, it can contain the following subfolders:

| Directory | Description | |:----------|:---------------------------------------------| | images/ | Image files (e.g. PNG, JPEG, etc.) | | css/ | CSS stylesheets | | js/ | JavaScript scripts | | www/ | Any other files (e.g. downloadable datasets) |

All the tutorials for this course can be found on Github as part of the teachr package. These tutorials are accessible in three different ways:

Running tutorials manually from RStudio

  1. Navigate in File/Open File...; or,
  2. [Recommended] Go to the folder where you have an .Rmd, right click on the file and ask for opening it using RStudio. This is the recommended way because it opens up in RStudio directly and sets up the working directory to the tutorial directory automatically.

When you open an Rmd file within RStudio, a new window appears with some content. You can then start the tutorial by clicking on the Run Document button; the tutorial should appear either in a new window or in the viewer pane on the bottom-right panel. You can then click on the third icon in this panel above the tutorial to view it in your browser for simplicity. In this course, we provide for each tutorial the .Rmd and .html associated file.

R Markdown

It is also of interest to dive into the structure of the Rmd file which is the extension of so-called R markdown documents. While you might know about Jupyter notebooks as an efficient way to create reports that mix texts, equations and Python code chunks, R markdown document are a more advanced alternative for such reproducible reporting. We recommend the following resources for learning R Markdown:

**Our turn:**

This manipulation will be necessary for each tutorial. By now, you should have reached the page we are on. We can now click together on Continue to proceed.

Exercises {data-allow-skip="TRUE"}

You will be provided the opportunity to carry out a number of exercises by yourself whenever you see an editable code box that looks like this:


**Hint**: You just clicked on the Hint button. Come on, you could have given some more thoughts before asking for help...

ex()

When writing code in the provided boxes, code completion is enabled, meaning that, as you type, R will propose a list of functions or other objects that match the sequence of letters you are typing.

Note that the tutorial format has been chosen for you to work in almost complete autonomy. However, **we highly encourage you to try and replicate the things you learn in your own RStudio session**, because outside of this course, there will be no tutorial environment, you will be on your own. Plus, the practice of installing new packages to enrich your R experience cannot be done from within a tutorial. In effect, packages required to run a specific tutorial have been already installed and cannot be uninstalled and you cannot install new ones from the tutorial.

You will find at the beginning of each tutorial a list of all required packages to run that tutorial in case you choose to practice also on your own in RStudio when you are done with the tutorial.

R and its package system

R as a modular software

The R programming language comes as an installable software. This is a language tailored for statistics. R is made of:

Upon installation, R comes with a following list of pre-installed packages:

installed.packages() %>% 
  tibble::as_tibble() %>% 
  dplyr::filter(Priority == "base") %>% 
  dplyr::select(Package, Version)

Here is a brief description of some fundamental pre-installed packages:

Customizing your R experience

The user can then custom his/her R environment by installing additional packages. To that end, the function install.packages() is available through the utils package.

The basic fundamental packages listed in the previous section, which utils is a part of, are by default loaded into your R environment. This means that all the functions inside these packages are readily available for use with no additional code required from your part. Hence, for instance, the function install.packages() is available. The following line of code shows you how you can install for instance the tidyverse package, of which we will make extensive use during this class:

install.packages("tidyverse")

There are two ways to call a function from a new custom package installed by the user. The first, not recommended, way is to load the package inside your R environment so as to make all of its functions available for use. This is achieved via the function library() from the base package. For example, we can use the function as_tibble() from the tibble package included in the tidyverse package we just installed to print the iris data set as a tibble, which is an R object suited for storing data sets:

library(tibble)
as_tibble(iris)

The second more appropriate and thus recommended way is to simply write the name of the package followed by two colons before the name of the function, so that R knows in which package it should look for the required function:

tibble::as_tibble(iris)

You can also install development versions of existing packages if these versions are hosted either on GitHub or on GitLab. To do so, first install the remotes package and then use remotes::install_github() or remotes::install_gitlab(). For example, we can install the development version of the testwhat package with the following line of code:

# First make sure remotes is installed
install.packages("remotes")
# Then install testwhat from GitHub using remotes::install_github()
remotes::install_github("datacamp/testwhat")

In general, the syntax for installing a package github_package_name from Github developed by someone known on Github as github_username is remotes::install_github("github_username/github_package_name").

The RStudio API

The code boxes are nothing but an R script interpreted by RStudio. The aim of the tutorials is to give you the basis for programming with the R language inside RStudio to be able to analyze data. Outside from the tutorials, it is therefore important to master the RStudio API and the resources that the RStudio website puts at our disposal.

Let us leave for a moment the tutorial and click back on the main RStudio window. Below, we report, for your convenience, a description of what we deem to be the most useful options available in the 5 main blocks of the RStudio API.

The Menu Bar

The File Tab

RStudio projects. An RStudio project is a nice way to structure your R code into projects. Fundamentally, it is nothing but a folder living on your computer (and, optionally also on GitHub or Gitlab). Inside the folder, you can find a file named after your project's name with extension .Rproj. When you setup an RStudio project, RStudio automatically sets the working directory as the root folder where the .Rproj file lives. The relative path thus starts at this position in the folder tree. We recommend to structure an RStudio project with the following subfolders:

When you close RStudio and then need to work again on this project, you only have to navigate to your project's folder and double-click on the project.Rproj file to have RStudio with the proper current directory ready for your next analyses on the project.

Import datasets. Statistics is all about analyzing real-world external data sets. Hence, it is important to learn how to import this data into your RStudio session. You have several options depending on the format of your external data. You can forget the option From Text (base) which uses old deprecated functions from the base package. This leaves us with:

The Code Tab

The Session Tab

In the Session tab, you can find useful shortcuts:

The Profile Tab

Using the functionalities available in the Profile tab requires the installation of the profvis package. Once installed, you can select a piece of code and profile it to understand which parts of the code are time-consuming so you can later optimize them. Once you have selected the code you want to profile, you can also use the keyboard shortcut CMD+OPTION+SHIFT+P (macOS) or CTRL+ALT+SHIFT+P (Windows, Linux) to start profiling it.

The Tools Tab

The Help Tab

There are some particularly helpful resources in the Help tab:

The Top-Left Panel

The top-left panel is dedicated to the source editor. This is where all source document will appear with customized action button for each source. In RStudio, you can author:

R code

Of course, RStudio, as the name suggests, allows users to author R code through R scripts which are files with extension .R.

Python and C++ code

The use of Python code and scripts requires the additional reticulate package while the use of C/C++ code requires the additional Rcpp package. Going deeper into these topics is beyond the scope of this class but it is worth mentioning that RStudio is not limited to R but can in fact combine very easily R with Python and C++.

SQL

In the era of big data, it is not uncommon that the data you need to analyze is stored into databases that you can access via SQL queries. RStudio understands SQL language and even allows you to author SQL code. This feature requires the additional RSQLite package.

STAN

The Stan C++ library is a state-of-the-art platform for statistical modeling and high-performance statistical computation. Users specify log density functions in Stan's probabilistic programming language and get:

  • full Bayesian statistical inference with MCMC sampling (NUTS, HMC)
  • approximate Bayesian inference with variational inference (ADVI)
  • penalized maximum likelihood estimation with optimization (L-BFGS)

RStudio understands STAN language and allows you to author STAN (C++) code. This feature requires the additional rstan package.

D3.js scripts:

D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring data to life using HTML, SVG, and CSS. D3's emphasis on web standards gives you the full capabilities of modern browsers without tying yourself to a proprietary framework, combining powerful visualization components and a data-driven approach to Document Object Model (DOM) manipulation.

RStudio understands D3 language and allows you to author D3 (JavaScript) code. This feature requires the additional r2d3 package.

Interactive reports, presentations and web applications

RStudio allows you to generate reports and/or presentations in which you can mix text, LaTeX equations or other math symbols and R, Python or C++ code chunks. These features require the additional rmarkdown package. The interactive report or presentation is created via a single .Rmd file which can then be converted into HTML, PDF, Word (for reports) or PowerPoint (for presentation) formats.

RStudio also allows you to author interactive Web applications, called Shiny apps. This feature requires the additional shiny package.

The Bottom-Left Panel

The bottom-left panel is made of 4 tabs: console, terminal, R markdown and jobs:

The Top-Right Panel

It is primarily composed of 3 tabs: environment, history and connections:

Note that RStudio capabilities are increased and enhanced by a number of independent R packages. As a result, your RStudio experience is dependent from the list of installed packages. In particular, other tabs could be automatically added by RStudio in this panel when you install some specific

The Bottom-Right Panel

It is composed of 5 tabs: files, plots, packages, help and viewer:

Quiz

quiz(
  question(
    "How can I keep all my installed packages up to date?",
    answer(
      "Using `Update` in the *Packages Tab* of the bottom-right panel", 
      correct = TRUE
    ),
    answer(
      "Using `Install packages...` in the *Tools Tab* of the menu bar",
    ),
    answer(
      "Using `Check for package updates` in the *Connections Tab* of the top-right panel"
    ),
    answer(
      "Using `Check for package updates` in the *Tools Tab* of the menu bar", 
      correct = TRUE
    ),
    random_answer_order = TRUE
  ),
  question(
    "Which programming language can be used in RStudio?",
    answer("Only R"),
    answer("R, Python and C++", correct = TRUE),
    answer("R, Python"),
    answer("R, C++"), 
    random_answer_order = TRUE
  ),
  question(
    "Which of the following packages are tailored for data import?",
    answer("haven", correct = TRUE),
    answer("readr", correct = TRUE),
    answer("readxl", correct = TRUE),
    answer("tidyverse"), 
    random_answer_order = TRUE
  ),
  question(
    "Which package is required to use C++ in RStudio?", 
    answer("Rcpp", correct = TRUE),
    answer("CPP"),
    answer("cppRouting"),
    answer("No package is required"), 
    random_answer_order = TRUE
  ),
  question(
    "What is the keyboard shortcut for running a piece of code?",
    answer("`CMD+ENTER` / `CTRL+ENTER`", correct = TRUE),
    answer("`CMD+OPTION+ENTER` / `CTRL+ALT+ENTER`"),
    answer("`OPTION+ENTER` / `ALT+ENTER`"),
    answer("`CMD+SPACE` / `CTRL+SPACE`"), 
    random_answer_order = TRUE
  ),
  question(
    "How can we access package cheatsheets?",
    answer("Via the *Help Tab* in the menu bar", correct = TRUE),
    answer("Via RStudio website", correct = TRUE),
    answer("Via the *Help Tab* in the bottom-right panel"),
    answer("Via the function `load_cheatsheet()`"), 
    random_answer_order = TRUE
  )
)


astamm/teachr documentation built on Jan. 12, 2023, 7:21 a.m.