Data structures

Overview

There are four fundamental data structures used in R that we will discuss here:

Vectors

The most important data structure in R, in my opinion, is the vector. A vector is a series of numbers, integers, characters or logicals. The key is that all of the members of the vector have to be of the same type.

x <- c(1, 2, 3)
y <- c(4, 5, 6)
z <- c('a', 'b', 'c')
# 3 vectos: x, y, and z
x <- c(1, 2, 3)
y <- c(4, 5, 6)
z <- c('a', 'b', 'c')

# We can use the print function to see the vectors in the console
print(x)
print(y)
print(z)


We can perform operations on vectors.

x + 1 


If we have two vectors of the same class we can do vectorized math

x + y


Create your own vector and perform an operation on it (note: this will only work if both vector are of the same type):


# Create numeric vector and assign it to the object "my_vec"
# Perform an operation on 'my_vec' (for example x^2)
# For fun create a character vector (call it 'my_chars') and add the values 
# of 'my_vec' to 'my_chars'. This gives an error. Why?
# In the same vein that vectors must all be of the same class, we can only 
# perform operations on vectors of the same class.


Matrices

A matrix is a 2d vector. You probably will not use them very often, so I will not say much about them.

# Matrices = 2d vectors
matrix(1:10, nrow = 5, ncol = 2)


Dataframes

Dataframes are 2d matrices with column and row names. IMPORTANT: a column in a dataframe is like a vector, each column can be of a different type (factor, numeric, etc.). This is the closest equivalent to an excel sheet. You will probably spend most of your time in R working with dataframes.

mtcars <- select(mtcars, 1:4)


Let's look at few dataframes. The head() function shows the 1st six rows of a dataframe and the tail() function shows the final six rows. Get the first six rows of mtcars

# Get first 6 rows of mtcars
head(mtcars)


Now print the last 6 rows of USArrests.

# Get last 6 rows of USArrests
tail(USArrests)


We can get more information about a dataframe using the str() and summary() functions. Try them below on the ToothGrowth dataset.


str(ToothGrowth)
summary(ToothGrowth)


jvcasillas/ds4ling documentation built on March 4, 2025, 11:18 p.m.