knitr::opts_chunk$set(echo = TRUE)

$\$

R Markdown

R Markdown documents allow one to use a simple formatting syntax for authoring HTML, PDF, and MS Word documents combined with R code. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

$\$

Download files needed for the next two classes

The code below uses the SDS230 package to get to download files needed for the next two classes. Please press the green play arrow once to download the files. Once they are downloaded you do not need to run this again.

On the homework, you will also need to start by downloading files also by pressing the green start arrow on an R chuck at the top of the document.

# you could also try downloading these files using the SDS230 package
library(SDS230)

download_data("daily_bike_totals.rda")
download_data("profiles_revised.csv")
download_image("sds.png")


# If you don't have the SDS230 package working yet, you can download the files using the following commands

#download.file("https://raw.githubusercontent.com/emeyers/SDS230/master/ClassMaterial/images/sds.png", "sds.png", mode = "wb")

#download.file("https://raw.githubusercontent.com/emeyers/SDS230/master/ClassMaterial/data/daily_bike_totals.rda", "daily_bike_totals.rda", mode = "wb")

#download.file("https://raw.githubusercontent.com/emeyers/SDS230/master/ClassMaterial/data/profiles_revised.csv", "profiles_revised.csv", mode = "wb")

$\$

R Markdown syntax - basics

In order to learn the syntax of R Markdown, there is a useful cheat sheet.

Adding proper R Markdown formatting to homework documents will be important!

$\$

R Markdown syntax - LaTeX

We can also LaTeX characters in the document, such as the Greek letter $\theta$, and we can put accents on characters too such as $\hat{p}$

Create:

$\hat{\theta}$ $\Sigma$

$\$

Running R code

We can run R code inside of R chunks

2 + 3

$\$

Back to learning R

Data frames

Let's now continue learning R by looking at data frames. Data frames are structed data and can also be thought of as collections of vectors.

Let's look at data from the website okcupid

# library(SDS230)
# download_data("profiles_revised.csv")

profiles <- read.csv("profiles_revised.csv")

#View(profiles)        # the View() function only works in R Studio!

# We can extract the columns of a data frame as vector objects using the $ symbol
the_ages <- profiles$age

# We can get the mean() age of OKCupid users
mean(the_ages)

$\$

We can extract rows from a data frame in a similar way as extracting values from a vector by using the square brackets

profiles[1, ]  # returns the first row of the data frame

head(profiles[, 1])  # returns the first column of the data 

# we are using the head() function here so that we don't print out too much stuff!

# Note: the first column of the profiles data frame is the variable age, so we can also get the first column using:
head(profiles$age)  # this is the same as profiles[, 1]  

$\$

We can also create vectors of numbers or Booleans specifying which rows we want to extract from a data frame

# create a vector with the numbers 1, 10, 20
my_vec <- c(1, 10, 20)

# use my_vec to get the 1st, 10th, and 20th profile
small_profiles <- profiles[my_vec, ]

$\$

Finally, we can also extract rows by creating a Boolean vector that is of the same length as the number of rows in the data frame. True values will be extracted from the data frame while false values will not

# create a vector of booleans
my_bools <- c(TRUE, FALSE, TRUE)

# use the Boolean vector to get the 1st and 3rd row 
my_bools <- c(TRUE, FALSE, TRUE)
small_profiles[my_bools, ]

# dim() gives us the the number of rows and columns in a data frame
dim(small_profiles)

dim(small_profiles[my_bools, ])

$\$

Examining categorical data

Categorical variables take on one of a fixed number of possible values

For categorical variables, we usually want to view:

Let's examine categorical data by looking at drinking behavior on OkCupid

# Get information about drinking behavior
drinking_vec <- profiles$drinks

# Create a table showing how often people drink
drinks_table <- table(drinking_vec)
drinks_table

$\$

We can create a relative frequency table using the function: prop.table(my_table)

Can you create a relative frequency table for the drinking behavior of the people in the okcupid data set in the R chunk below?

drinks_table <- table(profiles$drinks)
prop.table(drinks_table)

$\$

bar plots (no pun intended)

We can plot the number of items in each category using a bar plot: barplot(my_table)

Can you create a bar plot for the drinking behavior of the people in the okcupid data set?

drinks_table <- table(profiles$drinks)
barplot(drinks_table)

$\$

Is there a problem with using the bar plot function without any of the extra arguments?

XKCD illusterates the point

Can you figure out how to fix your plot?

We can also create pie charts using the pie function

pie(prop.table(table(profiles$sex, useNA = "always")))

Some pie charts are more informative than others

$\$

Our plots are dominated by social drinkers - let's remove them...

nonsocial_inds <- drinks_table < 10000
nonsocial_drinks_table <- drinks_table[nonsocial_inds]
pie(nonsocial_drinks_table)
barplot(nonsocial_drinks_table)

There are other websites/apps for dating as well

$\$

Reading for next class

Please read the article The Big Lies People Tell In Online Dating.

Once you have read the article, please share your thoughts on this article by filling out this quick survey. We will briefly discuss the article in groups at the start of next class.



emeyers/SDS230 documentation built on Jan. 18, 2024, 1:01 a.m.