knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
library(lehmansociology)
education_and_poverty<-create_educ_poverty_data()
library(ggplot2)
#library(plyr)
library(dplyr)

In SOC 345 we are using the graphing system known as ggplot2.

ggplot is a system in which you start a plot and then layer additional features by adding commands using a +.

Notice that as shown above you must load the ggplot2 package using the library function, library(ggplot2).

Here are some examples.

Some of these examples assume you have created the education_and_poverty data set as done above.

Dotplot

Creating a plot always starts with a ggplot() command. A dotplot has only one variable, the x. We are using the education_and_poverty dataset. The scale_y_continuous helps to make the dotplot look better.

Every ggplot has a geom_ that says what kind of graph we are making. In this case it is a dot plot.

# Dot plot
# This makes the basic plot
ggplot(education_and_poverty,
       aes(x=PCTPOVALL_2013),
       scale_y_continuous(breaks=NULL)) +
  geom_dotplot(binwidth=1) 

But then we can add more features like labels and a title using the +

# Dot plot
ggplot(education_and_poverty,
       aes(x=PCTPOVALL_2013),
       scale_y_continuous(breaks=NULL)) +
  geom_dotplot(binwidth=1) +
  geom_rug() +
  labs( x="Percent in Poverty") +
  scale_y_continuous(breaks=NULL) +
  ggtitle("Fig # :")

Histogram

Another basic plot type is the histogram. This also only has one variable and uses geom_histogram().

# Histogram

ggplot(education_and_poverty, aes(x=PCTPOVALL_2013)) +
  geom_histogram(binwidth=1) +
  labs(y = "Count",     x="Percent in Poverty") +
  ggtitle("Fig # :")

Stem and leaf plot

The graph we look at which does not use ggplot is the stem and leaf plot. A stem and leaf plot organizes/splits data values into a "stem" (the first digit or digits) and a "leaf" (usually the last digit). With a stem and leaf plot be sure to use the appropriate scale variable for your data! This is not a real plot in that it does not produce an image file. It is just displaying text. You cannot use the ggplot options with it.

stem(education_and_poverty$PCTPOVALL_2013, scale = 2)

Bar plots

Bar plots have two variables, the variable defines the bars, and the y variable determines how tall the bars are. In bar charts the x variable is discrete, so notice that there is a space between the bars. Note we use geom_bar().

ggplot(education_and_poverty, 
       aes(x= Area_Name, PCTPOVALL_2013, y = PCTPOVALL_2013)) +
  geom_bar(stat="identity") +
    #Extras
  labs(y = "Percent in Poverty",     x="State") +
     # This line is going to rotate the state names by 90 degrees. You can try other values.
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  ggtitle("Fig # :")

Here's a slightly different way that illustrates using advanced features, in this case sorting the values from low to high.

ggplot(education_and_poverty, 
       aes(x= reorder(Area_Name, PCTPOVALL_2013), y = PCTPOVALL_2013)) +
  geom_bar(stat="identity") +
  labs(y = "Percent in Poverty",     x="State") +
     # This line is going to rotate the state names by 90 degrees. You can try other values.
    theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1)) +
  ggtitle("Fig # :")

Scatterplots

Scatter plots are used when you have two interval variables. Notice that here we use geom_point() to indicate that we want a scatterplot, which is made up of points.

ggplot(education_and_poverty,
       aes(x=Percent.of.adults.with.less.than.a.high.school.diploma..2009.2013,
           y=PCTPOVALL_2013)) +
  geom_point() +
  ggtitle("Fig # : Poverty Rate and High School Completion (for States)") +
  labs(x = "Percent of adults with less than highschool diploma",  
       y="Percent of population in poverty") 

Box Plots

A box plot shows:

The x variable represents the groups and the y variable is the one we are comparing across groups.

# In R, the term factor is used for nominal variables.
ggplot(education_and_poverty, 
       aes(x=factor(region), y=PCTPOVALL_2013)) +
 geom_boxplot()

Additions

ggplot will let us add lots of features. For example you can add vertical and horizontal lines to your plots to help make the more understandable. Adding the lines below will add a vertical line (vline) at the mean and median and a horizontal line at 5.

geom_vline(aes ( xintercept = median(education_and_poverty$PCTPOVALL_2013)))

geom_vline(aes ( xintercept = mean(education_and_poverty$PCTPOVALL_2013)))

geom_hline(aes ( yintercept = 5))

If you want to add a simple linear fit line to a scatterplot use stat_smooth(method = lm, se=FALSE)

Use se=TRUE to add a ribbon showing the standard error.

Maps

Here is the basic ggplot code for making a map.

This is a little more complex because you need to set up the data first.

# First we need to load an extra library
library(maps)
states_map <- map_data("state")

# We'll use this library if we want to play with colors in the maps
library("RColorBrewer")

If you look at the states_map data you will see that the state names are in lower case. Unfortunately our Area_Name variable is in upper case so ... it won't match. So first we have to deal with that and then we can merge the map data and our education_and_poverty data together.

education_and_poverty$state <- tolower(education_and_poverty$Area_Name)
education_and_poverty <- dplyr::select(education_and_poverty, -region)

education_and_poverty_map <- merge(states_map, education_and_poverty, 
                                 by.x = "region", by.y="state")

This actually makes the map, and you an see that this part is very similar to every other ggplot.

# If you want to try colors uncomment the next line and change to the color you want.
# Also uncomment the scale_fill_gradientn line.
# You can change the color and the number of colors.

# colors<-brewer.pal(7,"Greens")
# ggplot(education_and_poverty_map, aes(x=long, y = lat, group = group, 
#                                       map_id = region, fill = PCTPOVALL_2013)) +
#   geom_map(map = education_and_poverty_map, color = "black" ) +
#   scale_fill_gradientn(colors=colors) +
#   coord_map("polyconic") +
#   ggtitle("Fig #: ")

Other Resources

Much excellent information on making graphs in R can be found at (http://www.cookbook-r.com/Graphs/") [Winston Chang's website] where he gives a lot of examples of how to do nice graphs using R. His book is practical and excellent.

This is the (https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf) [cheatsheet from RStudio] contains a huge amount of information in compressed form. You should push yourselves to make great looking graphics and if you read some of those tips it will go a long way helping with that. There are also lots of other sources of help out there. For example this site has some (http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/") [neat examples].

(http://sctyner.github.io/ggplot_tutorial.html) [This slide deck] is fantastic at really explaining why and how things work, and I'd recommend it too.



lehmansociology/lehmansociology documentation built on May 21, 2022, 9:06 p.m.