R package moe calculates margin of error for simple probability samples and can correct for population size.
install.packages("devtools")
devtools::install_github("peterdalle/moe")
library(moe)
moe(proportion, n, conf.level = 0.95, digits = 2,
population.correction = FALSE, population.size = NULL)
proportion
= value between 0 and 1 indicating the proportion, such as 0.30 for 30 percent.n
= sample size.conf.level
= confidence level (defaults to 0.95).digits
= number of decimal digits used when formatting the results as APA and human-readable messages (defaults to 2).population.correction
= whether or not results should be corrected by population size (defaults to FALSE).population.size
= population size used by the population correction (defaults to NULL). Only used if population.correction is set to TRUE.a list with:
margin.of.error
= margin of error (in percentage points).conf.level
= confidence level (in percentage points).conf.lower
= confidence interval lower bound (in percentage points).conf.upper
= confidence interval upper bound (in percentage points).proportion
= proportion (same as input parameter).percentage
= proportion expressed as percentage.z.value
= z-value from normal distribution.digits
= number of digits used to format APA confidence intervals.n
= sample size (same as input parameter).population.corrected
= whether or not the margin of error is corrected for population size (same as input parameter).population.size
= population size (same as input parameter).fpc
= finite population correction, between 0 and 1.sampling.fraction
= sampling fraction, ratio of sample size to population size, between 0 and 1.error.uncorrected
= margin of error before it is corrected for population size (in percentage points).apa
= APA6 style formatted confidence intervals, such as 43.2%, 95% CI [40.1, 46.5].In this case, a political party got 30% in a sample of 1,200 voters.
# Get margin of error.
moe(proportion=0.30, n=1200)
Which outputs:
[1] 2.592789
Common methods such as summary()
and print()
are supported, as well as as.character()
, as.double()
, and as.integer()
.
With summary()
, all available information is given:
m <- moe(proportion=0.30, n=1200)
summary(m, digits=2)
Which outputs:
Parameters
Margin of error: 2.592789
Proportion: 0.3 (30%)
Confidence level: 95%
Confidence interval: [27.41, 32.59]
Sample size: n = 1200
z-value: 1.959964
APA6 style format: 30%, 95% CI [27.41, 32.59]
Interpretation
A share of 30% with a sample size of 1200 has a 95% confidence interval between
27.41 and 32.59 percentage points, and the margin of error is plus/minus 2.59
percentage points.
Using the as.character()
method, you can extract the APA6 styled confidence intervals:
# Get APA6 confidence intervals.
m <- moe(proportion=0.30, n=1200)
as.character(m, digits=2)
Which outputs:
[1] "30%, 95% CI [27.41, 32.59]"
You can also extract specific data from the returned list:
m <- moe(proportion=0.3, n=1200)
m$conf.lower
m$conf.upper
Which outputs:
[1] 27.40721
[1] 32.59279
A simple way to do a 2-sample test of proportions is to simply subtract one moe object from another.
m1 <- moe(proportion=0.33, n=1200)
m2 <- moe(proportion=0.37, n=1200)
# Is the difference statistically significant?
m1 - m2
Which outputs:
Note: Using the 95% confidence level from 'm1'.
2-sample test for equality of proportions with continuity correction
data: c(proportion1, proportion2) out of c(n1, n2)
X-squared = 4.0458, df = 1, p-value = 0.04428
alternative hypothesis: two.sided
95 percent confidence interval:
-0.078964583 -0.001035417
sample estimates:
prop 1 prop 2
0.33 0.37
This subtraction is equivalent to prop.test(x = c(0.33*1200, 0.37*1200), n = c(1200, 1200))
which gives identical result.
Thus, we can see that the two proportions differ significantly at the 0.05 alpha level.
The results from the test of proportions can also be saved, res <- (m1 - m2)
, and then accessed like res$p.value
.
If your sample is large, you can correct for population size using finite population correction. In essence, the closer the sample size is to the population size, the smaller the margin of error will be. In everyday survey research (where typical n = 1,000 and population size is millions), however, the effect of population correction is trivial.
In this example, the sample is 50,000. We correct for population size (in this fictional country with 300,000 voters) and increase the confidence level to 99%.
m <- moe(proportion=0.355, n=50000, conf.level=0.99, population.correction=TRUE, population.size=300000)
summary(m)
Which outputs:
Parameters
Margin of error: 0.4593512
Proportion: 0.355 (35.5%)
Confidence level: 99%
Confidence interval: [35.04, 35.96]
Sample size: n = 50000
z-value: 2.575829
APA6 style format: 35.5%, 99% CI [35.04, 35.96]
Note: Margin of error and confidence intervals are corrected for population size.
Population correction
Population size: N = 300000
Sampling fraction: 0.1666667
Finite population correction: 0.8333333
Uncorrected margin of error: 0.5512215 (difference: 0.09187024)
Interpretation
A share of 35.5% with a sample size of 50000 has a 99% confidence interval
between 35.04 and 35.96 percentage points, and the margin of error is
plus/minus 0.46 percentage points. These percentage points are corrected for
the population size of 300000.
We also see that the margin of error before correction is 0.5512215 and after correction it is 0.4593512. These are percentage points, so the the difference (0.09187024) is not particularly large in this case.
The sampling fraction is the ratio of sample size to population size, which is about 0.16. The closer the sampling fraction is to 1, the closer the sample size is to the population size.
The graph below shows the margin of error at different sample sizes, and for three different proportions (e.g., when a party got 10%, 20% or 50% of the votes).
Note that the closer the votes are to 50% (0.5), the larger the margin of error.
The graph can be reproduced with the code below.
# Generate all margin of errors for all sample sizes between 1 and 2000.
library(dplyr)
library(ggplot2)
library(moe)
# Sample sizes.
start.n <- 1 # Start at this sample size.
stop.n <- 2000 # Stop at this sample size.
# Create a data frame to store the data.
df <- data.frame(n=0, proportion=0, error=0)
# This may take a minute or two.
for(p in seq(0, 1, by=.1)) {
for(n in seq(start.n, stop.n, by=1)) {
df <- rbind(df, data.frame(n=n, proportion=p, error=as.numeric(moe(p, n))))
}
}
# Set as factor to overcome problem of floating-point numbers,
# such as trying to filter(proportion == 0.3) which doesn't work.
df$proportion <- as.factor(df$proportion)
# Graph the data with three lines (0.1, 0.2, and 0.5),
# and restrict the Y-axis (margin of error) to 20%.
df %>%
filter(proportion %in% c("0.1", "0.2", "0.5")) %>%
filter(error < 20) %>%
ggplot(aes(n, error, color=proportion, linetype=proportion)) +
geom_line() +
scale_y_continuous(breaks = seq(0, 100, 2), limits = c(0, 20)) +
scale_x_continuous(breaks = seq(0, stop.n, 200)) +
theme_minimal() +
labs(title = "Margin of error at different sample sizes",
x = "Sample size (n)",
y = "Margin of error (%)",
color = "Proportion",
linetype = "Proportion")
summary()
.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.