An Introduction to R {#cha-introduction-to-r}

#    IPSUR: Introduction to Probability and Statistics Using R
#    Copyright (C) 2018 G. Jay Kerns
#
#    Chapter: An Introduction to R
#
#    This file is part of IPSUR.
#
#    IPSUR is free software: you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation, either version 3 of the License, or
#    (at your option) any later version.
#
#    IPSUR is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with IPSUR.  If not, see <http://www.gnu.org/licenses/>.

Every R book I have ever seen has had a section/chapter that is an introduction to R, and so does this one. The goal of this chapter is for a person to get up and running, ready for the material that follows. See Section \@ref(sec-external-resources) for links to other material which the reader may find useful.

What do I want them to know?

Downloading and Installing R {#sec-download-install-r}

The instructions for obtaining R largely depend on the user's hardware and operating system. The R Project has written an R Installation and Administration manual with complete, precise instructions about what to do, together with all sorts of additional information. The following is just a primer to get a person started.

Installing R

Visit one of the links below to download the latest version of R for your operating system:

On Microsoft Windows, click the R-x.y.z.exe installer to start installation. When it asks for "Customized startup options", specify Yes. In the next window, be sure to select the SDI (single document interface) option; this is useful later when we discuss three dimensional plots with the rgl package [@rgl].

Installing R on a USB drive (Windows)

With this option you can use R portably and without administrative privileges. There is an entry in the R for Windows FAQ about this. Here is the procedure I use:

  1. Download the Windows installer above and start installation as usual. When it asks where to install, navigate to the top-level directory of the USB drive instead of the default C drive.
  2. When it asks whether to modify the Windows registry, uncheck the box; we do NOT want to tamper with the registry.
  3. After installation, change the name of the folder from R-x.y.z to just plain R. (Even quicker: do this in step 1.)
  4. Download this shortcut and move it to the top-level directory of the USB drive, right beside the R folder, not inside the folder. Use the downloaded shortcut to run R.

Steps 3 and 4 are not required but save you the trouble of navigating to the R-x.y.z/bin directory to double-click Rgui.exe every time you want to run the program. It is useless to create your own shortcut to Rgui.exe. Windows does not allow shortcuts to have relative paths; they always have a drive letter associated with them. So if you make your own shortcut and plug your USB drive into some other machine that happens to assign your drive a different letter, then your shortcut will no longer be pointing to the right place.

Installing and Loading Add-on Packages {#sub-installing-loading-packages}

There are base packages (which come with R automatically), and contributed packages (which must be downloaded for installation). For example, on the version of R being used for this document the default base packages loaded at startup are

getOption("defaultPackages")

The base packages are maintained by a select group of volunteers, called R Core. In addition to the base packages, there are over ten thousand additional contributed packages written by individuals all over the world. These are stored worldwide on mirrors of the Comprehensive R Archive Network, or CRAN for short. Given an active Internet connection, anybody is free to download and install these packages and even inspect the source code.

To install a package named foo, open up R and type install.packages("foo") \index{install.packages@\texttt{install.packages}}. To install foo and additionally install all of the other packages on which foo depends, instead type install.packages("foo", depends = TRUE).

The general command install.packages() will (on most operating systems) open a window containing a huge list of available packages; simply choose one or more to install.

No matter how many packages are installed onto the system, each one must first be loaded for use with the library \index{library@\texttt{library}} function. For instance, the foreign package [@foreign] contains all sorts of functions needed to import data sets into R from other software such as SPSS, SAS, etc. But none of those functions will be available until the command library("foreign") is issued.

Type library() at the command prompt (described below) to see a list of all available packages in your library.

For complete, precise information regarding installation of R and add-on packages, see the R Installation and Administration manual.

Communicating with R {#sec-communicating-with-r}

One line at a time

This is the most basic method and is the first one that beginners will use.

Multiple lines at a time

For longer programs (called scripts) there is too much code to write all at once at the command prompt. Furthermore, for longer scripts it is convenient to be able to only modify a certain piece of the script and run it again in R. Programs called script editors are specially designed to aid the communication and code writing process. They have all sorts of helpful features including R syntax highlighting, automatic code completion, delimiter matching, and dynamic help on the R functions as they are being written. Even more, they often have all of the text editing features of programs like Microsoft(\circledR)Word. Lastly, most script editors are fully customizable in the sense that the user can customize the appearance of the interface to choose what colors to display, when to display them, and how to display them.

Graphical User Interfaces (GUIs)

By the word "GUI" I mean an interface in which the user communicates with R by way of points-and-clicks in a menu of some sort. Again, there are many, many options and I only mention one that I have used and enjoyed.

Basic R Operations and Concepts {#sec-basic-r-operations}

The R developers have written an introductory document entitled "An Introduction to R". There is a sample session included which shows what basic interaction with R looks like. I recommend that all new users of R read that document, but bear in mind that there are concepts mentioned which will be unfamiliar to the beginner.

Below are some of the most basic operations that can be done with R. Almost every book about R begins with a section like the one below; look around to see all sorts of things that can be done at this most basic level.

Arithmetic {#sub-arithmetic}

2 + 3       # add
4 # 5 / 6   # multiply and divide
7^8         # 7 to the 8th power

Notice the comment character #. Anything typed after a # symbol is ignored by R. We know that (20/6) is a repeating decimal, but the above example shows only 7 digits. We can change the number of digits displayed with options:

options(digits = 16)
10/3                 # see more digits
sqrt(2)              # square root
exp(1)               # Euler's constant, e
pi       
options(digits = 7)  # back to default

Note that it is possible to set digits up to 22, but setting them over 16 is not recommended (the extra significant digits are not necessarily reliable). Above notice the sqrt function for square roots and the exp \index{exp@\texttt{exp}} function for powers of (\mathrm{e}), Euler's number.

Assignment, Object names, and Data types {#sub-assignment-object-names}

It is often convenient to assign numbers and values to variables (objects) to be used later. The proper way to assign values to a variable is with the <- operator (with a space on either side). The = symbol works too, but it is recommended by the R masters to reserve = for specifying arguments to functions (discussed later). In this book we will follow their advice and use <- for assignment. Once a variable is assigned, its value can be printed by simply entering the variable name by itself.

x <- 7*41/pi   # don't see the calculated value
x              # take a look

When choosing a variable name you can use letters, numbers, dots "\texttt{.}", or underscore "\texttt{_}" characters. You cannot use mathematical operators, and a leading dot may not be followed by a number. Examples of valid names are: x, x1, y.value, and y_hat. (More precisely, the set of allowable characters in object names depends on one's particular system and locale; see An Introduction to R for more discussion on this.)

Objects can be of many types, modes, and classes. At this level, it is not necessary to investigate all of the intricacies of the respective types, but there are some with which you need to become familiar:

You can determine an object's type with the typeof \index{typeof@\texttt{typeof}} function. In addition to the above, there is the complex \index{complex@\texttt{complex}} \index{as.complex@\texttt{as.complex}} data type:

sqrt(-1)              # isn't defined
sqrt(-1+0i)           # is defined
sqrt(as.complex(-1))  # same thing
(0 + 1i)^2            # should be -1
typeof((0 + 1i)^2)

Note that you can just type (1i)^2 to get the same answer. The NaN \index{NaN@\texttt{NaN}} stands for "not a number"; it is represented internally as double \index{double}.

Vectors {#sub-vectors}

All of this time we have been manipulating vectors of length 1. Now let us move to vectors with multiple entries.

Entering data vectors

The long way: \index{c@\texttt{c}} If you would like to enter the data 74,31,95,61,76,34,23,54,96 into R, you may create a data vector with the c function (which is short for concatenate).

x <- c(74, 31, 95, 61, 76, 34, 23, 54, 96)
x

The elements of a vector are usually coerced by R to the the most general type of any of the elements, so if you do c(1, "2") then the result will be c("1", "2").

A shorter way: \index{scan@\texttt{scan}} The scan method is useful when the data are stored somewhere else. For instance, you may type x <- scan() at the command prompt and R will display 1: to indicate that it is waiting for the first data value. Type a value and press Enter, at which point R will display 2:, and so forth. Note that entering an empty line stops the scan. This method is especially handy when you have a column of values, say, stored in a text file or spreadsheet. You may copy and paste them all at the 1: prompt, and R will store all of the values instantly in the vector x.

Repeated data; regular patterns: the seq \index{seq@\texttt{seq}} function will generate all sorts of sequences of numbers. It has the arguments from, to, by, and length.out which can be set in concert with one another. We will do a couple of examples to show you how it works.

seq(from = 1, to = 5)
seq(from = 2, by = -0.1, length.out = 4)

Note that we can get the first line much quicker with the colon operator.

1:5

The vector LETTERS \index{LETTERS@\texttt{LETTERS}} has the 26 letters of the English alphabet in uppercase and letters \index{letters@\texttt{letters}} has all of them in lowercase.

Indexing data vectors

Sometimes we do not want the whole vector, but just a piece of it. We can access the intermediate parts with the [] operator. Observe (with x defined above)

x[1]
x[2:4]
x[c(1,3,4,8)]
x[-c(1,3,4,8)]

Notice that we used the minus sign to specify those elements that we do not want.

LETTERS[1:5]
letters[-(6:24)]

Functions and Expressions {#sub-functions-and-expressions}

A function takes arguments as input and returns an object as output. There are functions to do all sorts of things. We show some examples below.

x <- 1:5
sum(x)
length(x)
min(x)
mean(x)      # sample mean
sd(x)        # sample standard deviation

It will not be long before the user starts to wonder how a particular function is doing its job, and since R is open-source, anybody is free to look under the hood of a function to see how things are calculated. For detailed instructions see the article "Accessing the Sources" by Uwe Ligges [@Ligges2006]. In short:

Type the name of the function without any parentheses or arguments. If you are lucky then the code for the entire function will be printed, right there looking at you. For instance, suppose that we would like to see how the intersect \index{intersect@\texttt{intersect}} function works:

intersect

If instead it shows UseMethod(something) \index{UseMethod@\texttt{UseMethod}} then you will need to choose the class of the object to be inputted and next look at the method that will be dispatched to the object. For instance, typing rev \index{rev@\texttt{rev}} says

rev

The output is telling us that there are multiple methods associated with the rev function. To see what these are, type

methods(rev)

Now we learn that there are two different rev(x) functions, only one of which being chosen at each call depending on what x is. There is one for dendrogram objects and a default method for everything else. Simply type the name to see what each method does. For example, the default method can be viewed with

rev.default

Some functions are hidden by a namespace (see An Introduction to R @Venables2010), and are not visible on the first try. For example, if we try to look at the code for wilcox.test \index{wilcox.test@\texttt{wilcox.test}} (see Chapter \@ref(cha-nonparametric-statistics)) we get the following:

wilcox.test
methods(wilcox.test)

If we were to try wilcox.test.default we would get a "not found" error, because it is hidden behind the namespace for the package stats [@stats] (shown in the last line when we tried wilcox.test). In cases like these we prefix the package name to the front of the function name with three colons; the command stats:::wilcox.test.default will show the source code, omitted here for brevity.

If it shows .Internal(something) \index{.Internal@\texttt{.Internal}} or .Primitive(something) \index{.Primitive@\texttt{.Primitive}}, then it will be necessary to download the source code of R (which is not a binary version with an .exe extension) and search inside the code there. See Ligges [@Ligges2006] for more discussion on this. An example is exp:

exp

Be warned that most of the .Internal functions are written in other computer languages which the beginner may not understand, at least initially.

Getting Help {#sec-getting-help}

When you are using R, it will not take long before you find yourself needing help. Fortunately, R has extensive help resources and you should immediately become familiar with them. Begin by clicking Help on RGui. The following options are available.

On the help pages for a function there are sometimes "Examples" listed at the bottom of the page, which will work if copy-pasted at the command line (unless marked otherwise). The example \index{example@\texttt{example}} function will run the code automatically, skipping the intermediate step. For instance, we may try example(mean) to see a few examples of how the mean function works.

R Help Mailing Lists

There are several mailing lists associated with R, and there is a huge community of people that read and answer questions related to R. See here for an idea of what is available. Particularly pay attention to the bottom of the page which lists several special interest groups (SIGs) related to R.

Bear in mind that R is free software, which means that it was written by volunteers, and the people that frequent the mailing lists are also volunteers who are not paid by customer support fees. Consequently, if you want to use the mailing lists for free advice then you must adhere to some basic etiquette, or else you may not get a reply, or even worse, you may receive a reply which is a bit less cordial than you are used to. Below are a few considerations:

  1. Read the FAQ. Note that there are different FAQs for different operating systems. You should read these now, even without a question at the moment, to learn a lot about the idiosyncrasies of R.
  2. Search the archives. Even if your question is not a FAQ, there is a very high likelihood that your question has been asked before on the mailing list. If you want to know about topic foo, then you can do RSiteSearch("foo") \index{RSiteSearch@\texttt{RSiteSearch}} to search the mailing list archives (and the online help) for it.
  3. Do a Google search and an \texttt{RSeek.org} search.

If your question is not a FAQ, has not been asked on R-help before, and does not yield to a Google (or alternative) search, then, and only then, should you even consider writing to R-help. Below are a few additional considerations.

External Resources {#sec-external-resources}

There is a mountain of information on the Internet about R. Below are a few of the important ones.

Other Tips

It is unnecessary to retype commands repeatedly, since R remembers what you have recently entered on the command line. On the Microsoft(\circledR) Windows R Gui, to cycle through the previous commands just push the (\uparrow) (up arrow) key. On Emacs/ESS the command is M-p (which means hold down the Alt button and press "p"). More generally, the command history() \index{history@\texttt{history}} will show a whole list of recently entered commands.

\newpage{}

Exercises



Try the IPSUR package in your browser

Any scripts or data that you put into this service are public.

IPSUR documentation built on May 2, 2019, 9:15 a.m.