Concepts and Basics of the SciencesPo Package

knitr::opts_chunk$set(cache=TRUE, echo=TRUE, warning = FALSE, message=FALSE, collapse = FALSE, prompt=FALSE,comment=NA, fig.align="center", fig.width=5, fig.height=4, dpi = 96, fig.show = "hold", fig.keep="last", sanitize=TRUE)

Overview

R is rapidly increasing popularity in scientific research; it has several advantages over other programming languages: it is free, it is fairly powerful, there is an excellent user community, and it was designed with the statistical analysis of data in mind. The purpose of this vignette is to introduce some of the function in the SciencesPo package, and how these can be used in data analysis workflows and reporting results.

The SciencesPo package is meant to provide primarily functions and algorithms for analyzing political behavior data, including measures of political fragmentation and seat allocation methods. In addition, it also equip the environment with some functions to do descriptive statistics, cross-tabulation, tests, and to render pre set publication-quality plots that will require only a minimum amount of fiddling details. Although this package is available for the general public, it meets my personal needs and tastes. Yours may be different.

The development of this package began in July of 2011. I found myself spending a great deal of time manipulating several datasets with a basket of standalone functions, but a very little time actually analyzing data. So, I thought this package could help researchers with interests alike. Since then, I have diligently labored to maintain this R library updated. The SciencesPo is currently hosted by the The R Foundation, and distributed over the internet via CRAN. You can find the package source code on Github, and you are welcome to contribute code via pull requests, or bug reports.

This document is organized as follows. In the next section, I addressed the basic aspects of setting up the package. Following, I discuss some of the basic data manipulation functions. Next, I show some of the basic functions for creating tables and cross-tabulations. Then, I present basic functions for computing distributions. Next, I address the basic functions for computing measures of political behavior. I end by introducing several plotting functions, including themes and color palettes available. Throughout the paper, I try to keep the commentary to a minimum so the user can easily breeze through this without having to digest my witty banter.

Learning by doing

This section introduces the basics of SciencesPo for conducting customary analysis; the package version I'm using is r packageVersion("SciencesPo").

SciencesPo loads packages as needed and assumes that they are installed. Thus, the recomended form of installing the package to ensure that all the needed packages are also installed is by using the following statement:

install.packages("SciencesPo", dependencies = c("Depends", "Suggests"))

Loading and unloading SciencesPo

library('SciencesPo') 

## Do things ... 

detach('package:SciencesPo', unload = TRUE)

If you load it via loadNamespace('SciencesPo'), you can call unloadNamespace('SciencesPo').

Whenever you load the package, it will setup its own environment, including plotting themes, summary table styles, etc. Thus, some objects may be printed or plotted differently as you would have seen before loading SciencesPo. Also, the package is also operator (%>%) friendly, so much of the functions will work as expected in a chain like.

Vignette and dataset lists

To see a list of existing vignettes, type:

library("SciencesPo")

vignette(package = "SciencesPo")

To see the collection of data included in the package, type:

data(package = "SciencesPo")

To search any topic within the package

Here are some examples that demonstrates the results of help.search(), or you can also use ?? to search for for all commands that have some strings related to the text within quotes. You can also restrict the search to a specific library as:

help.search("D'Hondt", package = 'SciencesPo')

The next example searches inside all R related sites for that "string", and open the results in a web browser.

RSiteSearch("D'Hondt") 

Exploratory data functions

In this section, a samll selection of functions for conducting exploratory data analysis (EDA) is presented.

Frequency tables and cross-tables

There is a specific document covering one-way, two-way, and multiway cross-tabulations with accompanying independent tests, so the following examples will only introduce this topic. To describe an entire data.frame, there is the Describe function, which will produce a summary description table with the following features: variable names, labels, factor levels, and frequencies or summary statistics. For small datasets, the output can be used as the "table one" descriptive statistics in a data analysis publication report.

data("titanic")
Describe(titanic) 
#' The function mcnemar.test can conduct McNemar’s test for matched pairs. For ex- ample, for Table 11.1,
ratings <- matrix(c(175, 16, 54, 188), ncol=2, byrow=TRUE,
                  dimnames = list("2004 Election" = c("Democrat", "Republican"), "2008 Election" = c("Democrat", "Republican")))
mcnemar.test(ratings, correct=FALSE)

To tabulate one variable responses, simply:

with(titanic, Crosstable(SURVIVED))

A more performant descriptive output can be obtained with the Freq command, which resembles the SPSS output.

Freq(titanic, SURVIVED) 

To add a second variable for a cross-tabulation:

with(titanic, Crosstable(SEX, CLASS))

To delete table entries that are less relevant, switch it to FALSE and to TRUE otherwise. For instance, to not show column proportions, switch column to FALSE.

Crosstable(titanic, SEX, CLASS, column = FALSE) 

To add a third variable:

Crosstable(titanic, SEX, CLASS, SURVIVED) 

The Crosstable() function can produce tables up to 3 variables with some of the most common statistical tests.

with(titanic, Crosstable(SEX, SURVIVED, fisher=TRUE) )
with(titanic, Crosstable(SEX, CLASS, SURVIVED, chisq = TRUE))

Sampling and Stratify Data

Computing Exploratory Statistics

To demonstrate the output of these functions, below are the US presidents and their main opponents' heights (cm). There is claim that height matters in American presidential elections.

require("SciencesPo")

data(stature)

Computing Method of Moments

These functions perform three types of skewness and kurtosis assessments in conformity with the popular e1071 package [@e1071]:

Skewness of Data

data(stature)

attach(stature)

# Type 1:
Skewness(winner.height, type = 1)
# Type 2 
Skewness(winner.height, type = 2)
# Type 3, the default
Skewness(winner.height)

Kurtosis of Data

data(stature)
attach(stature)

# Type 1:
Kurtosis(winner.height, type = 1)
# Type 1:
Kurtosis(winner.height, type = 2)
# Type 3, the default
Kurtosis(winner.height)

Most Repeating Observations (Mode)

x <- sample (10, replace = TRUE)

Mode(x)

Standard Error and Confidence Intervals

data(stature)
attach(stature)

CI(winner.height, level=.95) # confidence interval

CI(winner.height, level=.95)@mean # get only the mean 

CI(opponent.height, level=.95, na.rm = TRUE) # confidence interval

SE(winner.height) # std. error

Computing Winsorized Means

data(stature)
attach(stature)


Winsorize(winner.height)

# replacing 35 outlier elements we get same stature values: 
Winsorize(winner.height, k=35)

Winsorize(opponent.height, k=35)

Computing Average Absolute Deviation

data(stature)
attach(stature)

AAD(winner.height) 

Data Manipulation

Labeling

data(religiosity)

Label(religiosity$Religiosity) <- "Religiosity index"

Renaming columns


Recoding strings and numeric values

Used to recode values of a vector into new values.

require(SciencesPo)

table(iris$Species)

iris$Species <- Recode(iris$Species, "versicolor", 2)

table(iris$Species)

Safechars

By default, R converts character columns to factors. Instead of re-reading the data using \code{stringsAsFactors=FALSE}, the \code{\link{Safechars}} function identifies which columns are currently factors, and convert them all to characters parsing the levels as strings.

require(SciencesPo)
str(iris)

iris_2 = Safechars(iris)

str(iris_2)

Destring

This function converts factor variables to numeric, much like the way Stata does.

require(SciencesPo)

# Simulate some data (12 respondents x 2 items)
df <- data.frame(replicate(2, sample(1:5, 12, replace=TRUE)))

df <- data.frame(lapply(df, factor, ordered=TRUE, 
                          levels=1:5, 
                          labels=c("Strongly disagree","Disagree", "Neutral","Agree","Strongly Agree")))


print(df)

By Destring it, we should get a numeric result with the same (un)order:

Destring(df, "X2") 

Unity-based normalization

x <- sample(10)

# won't print normalized values by default 
(y = Normalize(x) )

# equals to: 
(x-min(x))/(max(x)-min(x))
# Smithson and Verkuilen approach
(y = Normalize(x, method="SV"))

Formatted

Simple but useful. The Formatted function rounds numbers to text and drops leading zeros in the process.

x <- as.double(c(0.1, 1, 10, 100, 1000, 10000))

Formatted(x) 

Formatted(x, "BRL")

p = c(0.25, 25, 50)

Formatted(p, "Perc", flag="+")

Formatted(p, "Perc", decimal.mark=",")

Anonymize data containing identifiable information

The following is an example data frame.

 dt <- data.frame(
 Z = sample(LETTERS,5),
 X = sample(1:5),
 Y = sample(c("yes", "no"), 5, replace = TRUE) )

dt;

Then, the anonymized result of the data frame above:

dt %>% Anonymize()

Computing Some Statistical Tests

Distributions

Here are some implementation of standard statistical distributions for pedagogic use. Most of these distributions are available within the base R, however, I remaned them with more intuitive names--at least in my view.

Normal Probability Density Function

normalpdf(x=1.2, mu=0, sigma=1)

Normal Cumulative Distribution

normalcdf(lower=-1.96, upper=1.96, mu=0, sigma=1)

Inverse Cumulative Standard Normal Distribution

invnormal(area=0.35, mu=0, sigma=1)

Dirichlet Distribution

The \code{rdirichlet} function will return a matrix with \code{n} rows, each containing a single random number according to the supplied alpha vector or matrix.

alphas <- cbind(1:4, 1, 4:1);
rdirichlet(4, alphas );

Consider the following example of usage: A Brazilian face-to-face poll conducted by Datafolha on Oct 03-04 (2014) with 18,116 insterviews, asked voters for their vote preference among presidential candidates.

# draw a sample from the posterior
set.seed(1234);
n <- 18116;
poll <- c(40,24,22,5,5,4) / 100 * n; # The data
mcmc <- 10000;
sims <- rdirichlet(mcmc, alpha = poll + 1);

Let's see a descriptive summary of the simulated data:

Describe(sims)

After obtained 10,000 simulated values, we look at the margins as of Aecio over Marina in the very last days of the campaign:

# compute the margins: Aecio minus Marina
margins <- sims[,2] - sims[,3];

# What is the mean of the margins
# posterior mean estimate:
mean(margins); 

# posterior standard deviation:
sd(margins); 

# 90% credible interval:
quantile(margins, probs = c(0.025, 0.975)); 

# posterior probability of a positive margin (Aécio over Marina):
mean(margins > 0); 

Political Behavior Measures

There is a specific vignette for Indices, covering in great detail many of the political behaviour indices.

Measures of Political Diversity

Let's consider the 1980's US presidential election (vote share) to compute the effective number of electoral parties (ENEP).

US1980 <- c("Democratic"=0.410, "Republican"=0.507,
              "Independent"=0.066, "Libertarian"=0.011,
              "Citizens"=0.003, "Others"=0.003);
US1980
PoliticalDiversity(US1980, index = "laakso/taagepera");

PoliticalDiversity(US1980, index = "golosov");

Considers the following data.frame with electoral results for the 1999 election in Helsinki, the seats were allocated using both Saint-Laguë and D'Hondt methods:

# Helsinki's 1999

Helsinki <- data.frame(
  votes = c(68885, 18343, 86448, 21982, 51587,
            27227, 8482, 7250, 365, 2734, 1925,
            475, 1693, 693, 308, 980, 560, 590, 185),
  seats.SL=c(5, 1, 6, 1, 4, 2, 1, 0, 0, 0, 0, 0, 0, 0,
            0, 0, 0, 0, 0),
 seats.dH=c(5, 1, 7, 1, 4, 2, 0, 0, 0, 0, 0, 0, 0, 0,
           0, 0, 0, 0, 0))

print(Helsinki)
# ENEP defaults (laakso/taagepera) method 

with(Helsinki, PoliticalDiversity(votes)); #ENEP Votes

with(Helsinki, PoliticalDiversity(seats.SL)); #ENP for Saint-Lague

with(Helsinki, PoliticalDiversity(seats.dH)); #ENP for D'Hondt

Proportionality Measures

Let's use the 2012 Quebec provincial election:

# 2012 Quebec provincial election:
Quebec <-
  data.frame(
  party = c("PQ", "Lib", "CAQ", "QS", "Option", "Green", "Others"),
  votes = c(1393703, 1360968, 1180235, 263111, 82539, 43394, 38738),
  pvotes = c(31.95, 31.20, 27.05, 6.03, 1.89, 0.99, 0.89),
  seats = c(54, 50, 19, 2, 0, 0, 0),
  pseats =  c(43.2, 40, 15.2, 1.6, 0, 0, 0)
  )

Quebec

Rae's proportionality index

with(Quebec, Proportionality(pvotes, pseats, 
                     index = "Rae") )

Loosemore-Hanby's Proportionality Index

with(Quebec, Proportionality(pvotes, pseats, 
                     index = "Loosemore-Hanby") )

Gallagher's proportionality index

with(Quebec, Proportionality(pvotes, pseats, 
                     index = "Gallagher") )

Highest Averages Methods of Allocating Seats Proportionally

Now using data from 2014 Brazilian legislative elections, especifically from one district, let's compare the results from D'Hondt, Saint-Lague, Hungtinton-Hill, and Imperiali methods.

# Results for the state legislative house of Ceara (2014):
Ceara <- c("PCdoB"=187906, "PDT"=326841,"PEN"=132531, "PMDB"=981096,
           "PRB"=2043217,"PSB"=15061, "PSC"=103679, "PSTU"=109830,
           "PTdoB"=213988, "PTC"=67145, "PTN"=278267)

print(Ceara)

The basic imputs for this class of functions are: 1) a list of parties, 2) a list of positive votes, and 3) a constant value for the number of seats to be returned. A numeric value (0~1) for the threshold is optional.

d'Hondt

HighestAverages(parties=names(Ceara), votes=Ceara,
                seats = 42, method = "dh") 

The d'Hondt method is only one way of allocating seats in party list systems. Other methods include the Saint-Lague, the modified Saint-Lague, the Danish version, Imperiali (do not to confuse with the Imperiali quota which is a Largest remainder method), and Hungtinton-Hill.

Saint-Lague

HighestAverages(parties=names(Ceara), votes=Ceara,
                seats = 42, method = "sl") 

Using thresholds

Let's assume the electoral system has a 5\% vote threshold. Meaning that parties must get at least 5\% of the total unspoiled votes cast in order to participate in the distribution of seats. Parties PCdoB, PTdoB, PEN, PSC, PSTU, PSB, and PTC would then be elimiated from competition at the outset. If the d'Hondt method of seat allocation were employed, then party PRB would get 4 extra seats than otherwise, and party PMDB 3 additional seats.

HighestAverages(parties=names(Ceara), votes=Ceara,
               seats = 42, method = "dh", threshold = 5/100) 

Other methods divide the votes by a mathematically derived quota, such as the Droop quota, the Hare quota (or Hamilton/Vinton), or the Imperiali quota, see next.

Largest Remainder Methods of Allocating Seats Proportionally

Hare quota

LargestRemainders(parties=names(Ceara), votes=Ceara, 
                seats = 42, method = "hare") 

Droop quota

LargestRemainders(parties=names(Ceara), votes=Ceara, 
                seats = 42, method = "droop") 

Suitable output for recycling in rmarkdown documents

The output produced by the highestAveragesof() and largestRemainders() functions is always a data.frame; therefore, it's very straightforward to use with other aplications. For instance, I like the idea of using the output with knitr [-@knitr] and ggplot2 [-@ggplot2] packages to produce publishable-quality tables and graphs.

mytable = HighestAverages(parties=names(Ceara), votes=Ceara, 
                seats = 42, method = "dh") 

library(knitr)

kable(mytable, align=c("l","c","c"))
mytable = HighestAverages(parties=names(Ceara), votes=Ceara, 
                seats = 42, method = "dh") 

p <- ggplot(mytable, aes(x=reorder(Party, Seats), y=Seats)) + 
  geom_lollipop() + coord_flip() + labs(x=NULL, y="# Seats")
p + theme_538() + theme(panel.grid.major.y = element_blank())

Measures of inequality and concentration

Lorenz curve

x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)

# compute weight values
wgt <- runif(n=length(x))

# compute the lorenz with especific weights
Lorenz(x, wgt)

Gini Index

x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)

wgt <- runif(n=length(x))

Gini(x, wgt)

Atkinson's Index

x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)

Atkinson(x, epsilon = 0.5)

Plot Themes & Colors

The SciencesPo brings in few themes and color palettes to enhance ggplot2 graphs, but there are specialized packages like ggthemes, ggthemr, cowplot, ggalt, GGally, and sjPlot that can make graph decoration a lot easier. There's also an interesting discussion on plot design in the outstanding reference Cookbook for R that might be of your interest.

With SciencesPo, one can "preview" ggplot themes by simply invoking the Previewplot() function. This will draw a predefined plot that we can combine with themes, for instance.

Previewplot() + theme_grey()

Themes

In my point of view, the default ggplot2 design has its charm, but, very often I don't like the gray background grid. I feel it distracts from the data. For example, see the following ggplot2 graph with default settings on.

# detach("package:SciencesPo", unload = TRUE)

gg <- ggplot(mtcars, aes(mpg, disp,color=factor(carb),size=hp)) 
gg <- gg + geom_point(alpha=0.7) + labs(title="Bubble Plot")
gg <- gg +scale_size_continuous(range = c(3,8)) 
gg 
library(SciencesPo)

gg <- ggplot(mtcars, aes(mpg, disp,color=factor(carb),size=hp)) 
gg <- gg + geom_point(alpha=0.7) + labs(title="Bubble Plot")
gg <- gg +scale_size_continuous(range = c(3,8)) 
gg <- gg + theme_blank()
gg
library(SciencesPo)

gg <- ggplot(mtcars, aes(mpg, disp,color=factor(carb),size=hp)) 
gg <- gg + geom_point(alpha=0.7) + labs(title="Bubble Plot")
gg <- gg +scale_size_continuous(range = c(3,8)) 
gg <- gg + theme_pub()
gg
library(SciencesPo)

gg <- ggplot(mtcars, aes(mpg, disp,color=factor(carb),size=hp)) 
gg <- gg + geom_point(alpha=0.7) + labs(title="Bubble Plot")
gg <- gg +scale_size_continuous(range = c(3,8)) 
gg <- gg + theme_538()
gg
set.seed(1)
test <- data.frame(
  org = rep(c("Coalition", "Opposition", "Right", "Center", "Left"), 3),
  level = rep(c("Government", "Opposition", rep("Parties", 3)), 3),
  group = rep("Opposition",15),
  election = rep(c("2006", "2010", "2014"),5),
  obsAvg = runif(15, 1, 4)
)

gg <- ggplot(test, aes(x = reorder(org, -as.numeric(level)), y = obsAvg, fill = level))
gg <- gg + geom_bar(aes(alpha=election), stat = "identity", position = "dodge")
gg <- gg + scale_fill_manual(values = c("#E69F00", "#56B4E9", "#009E73"))
gg <- gg + scale_alpha_manual(values = c(.5, .75, 1), guide = FALSE)
gg <- gg + labs(title = "Average Score by Election", y = "", x = "", fill = "Group")
gg <- gg + geom_text(aes(label = round(obsAvg,1), group=election), vjust = -.3, size = 4, fontface="bold", position = position_dodge(width = .9))
gg <- gg + scale_y_continuous(limits = c(0,5), expand = c(0,0))
gg <- gg + theme_538(legend="bottom", base_size = 12)
gg
library(scales)
library(grid)
library(ggplot2)
library(plyr)

df <- read.table(text = "year, district.id, party, seats, ideology
2012, 127, Stranka Pravde I Razvoja Bosne I Hercegovine, 1, p
2012, 127, SBB, 3, p
2008, 127, SDA, 13, p
2004, 127, SDA, 14, p
2008, 127, HDZ, 1, c
2008, 127, Stranka Pravde I Razvoja Bosne I Hercegovine, 1, p
2012, 127, SzBiH, 4, p
2000, 127, SDP, 8, m
2012, 127, NSRzB, 2, m
2012, 127, SDU, 1, p
2000, 127, SDA-SBiH, 15, p
2008, 127, SDP, 5, m
2008, 127, NSRzB, 1, m
2008, 127, LDS-SDU, 2, m
2000, 127, liberali-Bih-Gds-Bih, 1, m
2000, 127, NHI, 1, c
1997, 127, SDP, 3, m
2012, 127, SDP, 6, m
2004, 127, SzBiH, 5, p
1997, 127, BPS, 9, p
2000, 127, BPS, 3, p
2008, 127, SzBiH, 4, p
1997, 127, HDZ, 5, c
2000, 127, HDZ, 2, c
2012, 127, SDA, 10, p
2004, 127, SDP, 6, m
1997, 127, SDA-SBiH-Liberali-GDS, 13, p", sep=",", header = TRUE)

df <- arrange(df, year, ideology, party)

# conservative parties blue
cons <- ggplot(data = df[df$ideology == "c", ],
               aes(x = as.factor(year),
                   y = seats,
                   fill = party)) +
  geom_bar(stat = "identity", position = "fill") +
  scale_fill_manual(values = blue, name = "Conservative parties" )

# progressive parties green
prog <- ggplot(data = df[df$ideology == "p", ],
               aes(x = as.factor(year),
                   y = seats,
                   fill = party)) +
  geom_bar(stat = "identity", position = "fill") +
  scale_fill_manual(values = green, name = "Progressive parties" )

# moderate parties red
mod <- ggplot(data = df[df$ideology == "m", ],
              aes(x = as.factor(year),
                  y = seats,
                  fill = party)) +
  geom_bar(stat = "identity", position = "fill") +
  scale_fill_manual(values = red, name = "Moderate parties" )
library(SciencesPo)

gg <- ggplot(mtcars, aes(mpg, disp,color=factor(carb),size=hp)) 
gg <- gg + geom_point(alpha=0.7) + labs(title="Bubble Plot")
gg <- gg +scale_size_continuous(range = c(3,8)) 
gg <- gg + theme_darkside()
gg

Changing defaults

If you want to change the theme for an entire session you can use theme_set() as in theme_set(theme_gray()) to switch to default ggplot2 theme for all subsequent plots. Otherwise, you may also apply themes without changing the defaults as of plot + theme_pub().

To modify general aspects of the theme_pub() as fontsize, font family, color etc:

require(SciencesPo)

# "Verdana", "Tahoma", "Gill Sans" "serif" and "sans" are also high-readability fonts
theme_set(theme_pub(base_size=18, base_family = "serif")) 
require(SciencesPo)

gg <- ggplot(mtcars, aes(mpg, disp,color=factor(carb),size=hp)) 
gg <- gg + geom_point(alpha=0.7) + labs(title="Bubble Plot")
gg <- gg +scale_size_continuous(range = c(3,8)) 
gg <- gg + theme_pub(legend = "none", base_size=18, base_family = "serif") 
gg

Modify it with theme(). You might modify text sizes, colors etc; then add to the plot.

prefs <- theme(axis.text = element_text(size=18, 
                                        family = "Gill Sans", 
                                        colour="red"))
prefs <- theme(axis.text = element_text(size=18, 
                                        family = "serif",
                                        colour="red"))

gg <- ggplot(mtcars, aes(mpg, disp,color=factor(carb),size=hp)) 
gg <- gg + geom_point(alpha=0.7) + labs(title="Bubble Plot")
gg <- gg +scale_size_continuous(range = c(3,8)) 
gg <- gg + theme_pub(legend = "none")
gg <- gg + prefs
gg

Quick plot customization

Alternatively, we can use a set of built in function to quick plot customization:

Previewplot() + 
  theme_538() + 
  align_title_right()
Previewplot() + 
  theme_538() + 
  no_y_gridlines()
Previewplot() + 
  theme_538() + 
  no_y_gridlines() +
  no_x_gridlines()

Generic plot annotations

The function Footnote() can be handy for adding extra text information, such as data source, equation, or credits as footnotes to a ggplot2 or gtable objects. The next chunck demonstrates its usage:

height_ratio <- c(0.924324324, 1.081871345, 1, 0.971098266, 1.029761905,
                  0.935135135, 0.994252874, 0.908163265, 1.045714286, 1.18404908,
                  1.115606936, 0.971910112, 0.97752809, 0.978609626, 1,
                  0.933333333, 1.071428571, 0.944444444, 0.944444444, 1.017142857,
                  1.011111111, 1.011235955, 1.011235955, 1.089285714, 0.988888889,
                  1.011111111, 1.032967033, 1.044444444, 1, 1.086705202,
                  1.011560694, 1.005617978, 1.005617978, 1.005494505, 1.072222222,
                  1.011111111, 0.983783784, 0.967213115, 1.04519774, 1.027777778,
                  1.086705202, 1, 1.005347594, 0.983783784, 0.943005181, 1.057142857)

vote_support <- c(0.427780852, 0.56148981, 0.597141922, 0.581254292, 0.530344067,
              0.507425996, 0.526679292, 0.536690951, 0.577825976, 0.573225387,
              0.550410082, 0.559380032, 0.484823958, 0.500466176, 0.502934212,
              0.49569636, 0.516904414, 0.522050547, 0.531494442, 0.60014892, 
              0.545079801, 0.604274986, 0.51635906, 0.63850958, 0.652184407, 
              0.587920412, 0.5914898, 0.624614752, 0.550040193, 0.537771958, 
              0.523673642, 0.554517134, 0.577511576, 0.500856251, 0.613444534, 
              0.504063153, 0.617883695, 0.51049949, 0.553073235, 0.59166415, 
              0.538982024, 0.53455133, 0.547304058, 0.497350649, 0.512424242, 
              0.536914796)

Presidents = data.frame(cbind(height_ratio, vote_support))             
gg <- ggplot(Presidents, aes(x=height_ratio, y=vote_support)) 
gg <- gg + geom_smooth(method=lm, colour="red", fill="gold")
gg <- gg + geom_point(size = 5, alpha = .7) 
gg <- gg +  xlim(0.9,1.2) + ylim(.40, .70)
gg <- gg + labs(x="Winner/Opponent height ratio", y="Winner vote share", title="Does Height Matter in Presidential Elections?")
gg <- gg + theme_pub()

# Commence adding layers here
Render(gg) + Footnote(note="danielmarcelino.github.io", color="orange")

Generic plots (Data Visualization 101)

with(mtcars, Scatterplot(x = wt, y = mpg,
main = "Vehicle Weight-Gas Mileage Relationship",
xlab = "Vehicle Weight",
ylab = "Miles per Gallon",
font.family = "serif") )
library(dplyr)
library(httr)
library(extrafont)
font_import() # if first time running extrafont

library(ggthemes)
# via http://zevross.com/blog/2014/08/04/beautiful-plotting-in-r-a-ggplot2-cheatsheet-3/
DATA_URL <- 'http://zevross.com/blog/wp-content/uploads/2014/08/chicago-nmmaps.csv'
dat <- content(GET(DATA_URL), col_names=TRUE, col_types=NULL)


gg <- ggplot(dat, aes(o3, temp, color = factor(season))) +
     geom_point() +
     labs(x = "Ozone",  y = "Temperature (F)") +
     scale_color_pub("seasons")

gg + theme_tufte() + ggtitle('Current theme_tufte()')
library(SciencesPo)
library(extrafont)
# font_import()
exp_text <- "italic(y == frac(1, sqrt(2 * pi)) * e ^ {-frac(x^2, 2)} )"


library(ggplot2)
theme_set(theme_grey())

set.seed(43121)

means <- NULL
adj_means <- NULL
lambda <- 0.2
n <- 40
for(i in 1:1000){
    vals <- rexp(n, rate=lambda)
    means <- c(means, mean(vals))
    adj_means <- c(adj_means, (1/lambda)+((mean(vals)-1/lambda)/sqrt(var(vals)/n)))
}


g <- ggplot(data=data.frame(means=adj_means), aes(x=means))
g <- g + ggtitle("Central Limit Theorem: Samples from Exponential Distribution")
g <- g + xlab("Means from 1000 Samples (n=40)") + ylab("Density")
g <- g + geom_histogram(
    aes(y=..density..), fill="#400040", colour="#FFFFFF", binwidth=0.1
)
g <- g + geom_vline(
    aes(xintercept=mean(means), colour="Actual Mean"), size=1,
    show.legend=TRUE
)
g <- g + geom_vline(
    aes(xintercept=1/lambda, colour="Expected Mean"), size=1,
    show.legend=TRUE
)
g <- g + stat_function(fun=dnorm, args=list(mean=1/lambda),
    aes(linetype="Normal Distribution"), colour="#D0D000", size=1,
    show.legend=FALSE
)
g <- g + scale_colour_manual("Means", values=c(
    "Expected Mean" = "#8080FF",
    "Actual Mean" = "#FF8080",
    "Normal Distribution" = NA
))
g <- g + scale_linetype_manual("Functions", values=c(
    "Expected Mean" = "blank",
    "Actual Mean" = "blank",
    "Normal Distribution" = "solid"
))
g <- g + guides(
    linetype = guide_legend(
        override.aes = list(colour="#D0D000")
    )
)
g

Enhanced geoms

d <- data.frame(x=c(1,1,2),y=c(1,2,2)*100)

gg <- ggplot(d,aes(x,y))
gg <- gg + scale_x_continuous(expand=c(0.5,1))
gg <- gg + scale_y_continuous(expand=c(0.5,1))
gg + geom_spotlight(s_shape=1, expand=0) + geom_point(color="red")

Color palettes

The following scales help passing manual values to scale colors for ggplot2 objects. In many ways, I think these scales are consistent with the themes whitin the SciencesPo package described earlier.

Discrete palletes

library("scales", quietly = TRUE)

show_col(pub_pal("pub12")(12))
show_col(pub_pal("gray5")(6), labels = FALSE)
show_col(pub_pal("manyeyes")(20))
show_col(pub_pal("tableau20")(20))
show_col(pub_pal("tableau10")(10))
 show_col(pub_pal("tableau10light")(10))
show_col(pub_pal("cyclic")(20))
party_pal("BRA", plot=TRUE)

Continuous palletes

Bivariate scales

show_col(pub_pal("trafficlight")(9))
show_col(pub_pal("bivariate1")(9))
show_col(pub_pal("bivariate2")(9))
show_col(pub_pal("bivariate3")(9))
show_col(pub_pal("bivariate4")(9))

R Session Details

library("devtools", quietly = TRUE)
session_info()

References



Try the SciencesPo package in your browser

Any scripts or data that you put into this service are public.

SciencesPo documentation built on May 29, 2017, 9:28 p.m.