In this workshop we will calculate various types of birth rates in the US between 1933 to 2015. The very first step is to load the necessary libraries in R. What this does gives R information on what each command means, sort of like loading a dictionary that R can check commands against so it knows what to do.

# Load your libraries here

library('lehmansociology')
library('HMDHFDplus')

We'll start with the Crude Birth Rate, often referred so simply as the "birth rate". To calculate the birth rate, first we need to get the total population by year. Mortality.org provides demographic data on a number of countries. You'll need a username and password for mortality.org to access the data, but I've created a general account for us all to use for this workshop. In the code below, we will pull and format the yearly population from 1933 to 2015 in the USA.

population <- readHMDweb(CNTRY = 'USA', item = 'Population', username = 'eric.ketcham@lehman.cuny.edu', password = 'STEMworkshop')
total_pop_by_year <- aggregate(x = population[4:9], by = list(Year = population$Year), FUN=sum)
total_pop_by_year <- dplyr::select(total_pop_by_year, Year, Total1)
total_pop_by_year <- dplyr::rename(total_pop_by_year, Population = Total1)

Next we will gather yearly birth data from mortality.org and format this data for easier use in calculating birth rates.

births_by_year <- readHMDweb(CNTRY = 'USA', item = 'Births', username = 'eric.ketcham@lehman.cuny.edu', password = 'STEMworkshop')
births_by_year <- dplyr::select(births_by_year, Year, Total)
births_by_year <- dplyr::rename(births_by_year, Births=Total)

Now we'll merge our population data and our births data into a single dataset.

birthdata <- merge(total_pop_by_year, births_by_year, by = 'Year')

With the data merged into a single dataset for easier use, let's calculate the annual birth rates. I'll supply the new variable names, and you'll supply the calculations in R. The birth rate is calculated by the following equation:

birth rate = (births/population)*1000

We'll do this in two steps. In the first step we'll calculate the number of births divided by the population. In the second, we'll multiply births/population by 1000 to get the birth rates. Our births variable is called Births and is in the birthdata dataset. To access this variable, we first write the dataset name, then $, then the variable name, i.e. birthdata$Births. This tells R what dataset to look in, then that variable to look for. All variables in the calculation will be in the birthdata dataset. Our variable for total births is called Births. Our variable for total population is called Population. Remember that R is case sensitive.

birthdata$births_population <- WRITE IN YOUR CALCULATION HERE
birthdata$birth_rate <- WRITE IN YOUR CALCULATION HERE

Let's take a look at what this data looks like. Entering the name of the dataset will display the data saved in this dataset in your console. As you'll see, we now have the population, births, births/population, and birth rate for each year from 1933 to 2015.

birthdata

Finally, we'll make a graph of this data to see it more clearly.

ggplot(birthdata, aes(x = Year, y = birth_rate)) + geom_line() + ggtitle("US Birth Rate 1933-2015")

What we've just calculated is the Crude Birth Rate. The CBR is a rough indicator of how many children are being born into a population, however it is rough measure. A better measure would take into account the proportion of the population that is able to have children, or more specifically, would focus on the population of women of childbearing ages. The Total Fertility Rate does just this. The TFR measures the number of births to women between the ages of 15 and 49 (though the exact age ranges used to calculate may differ).

To start, we need to know the number of births to women of each particular age in each year. The first thing we'll do is upload a dataset with this information. Under "Files" on the bottom right of this console, select Upload, then choose the "USAbirthsRR.txt" file that you've saved to your computer.

agespecificbirths <- readHFD('USAbirthsRR.txt', fixup=TRUE)

agespecificbirths

As you can see, this dataset gives us information on the number of births to women of a particular age in a particular year.

Next, we need to create a dataset that includes this information, as well as information on the number of women in the population of these particular ages in these particular years.

ASFRdata <- subset(population, Age>=12)
ASFRdata <- subset(ASFRdata, Age<=55)
ASFRdata <- subset(ASFRdata, Year<=2014)
ASFRdata$births <- agespecificbirths$Total
ASFRdata <- subset(ASFRdata, Age>=15)
ASFRdata <- subset(ASFRdata, Age<=49)

Once we've created our dataset with information on the female population at each age between the ages of 15 and 49 in the years we are interested in, and the number of births to women in each age group in each year, we can get a little bit more sophisicated with our anaylsis. Go ahead and look at the data in the ASFRdata dataset. You'll notice we have two populations listed for each age and each year for each gender. The first one is the population at the beginning of the year, and the second one is the population at the end of the year. Over the course of the year, individuals will turn one year older at various points throughout the year, and some individuals in each age group will die. To account for this, we can calculate an "exposure" for each age. This is not an exact number of people who were a specific age in a specific year, but rather an estimate of the number of person-years lived in this age group in this year. An individual person may contribute a whole person-year or just a fraction of a person-year to this total depending on how long they spent in that age group in that year.

ASFRdata

One way to estimate this exposure is to add the population at the beginning of the year to the population at the end of the year, and divide by 2. Go ahead an write in the calculation for this here.

ASFRdata$exposure <- WRITE IN YOUR CALCULATION HERE

We then use this exposure as the denominator in our calculation of Age Specific Fertility Rates.

ASFRdata$ASFR <- ASFRdata$births/ASFRdata$exposure

Let's take a look at what ASFR has looked like across the years 1933 to 2014.

ggplot(data = ASFRdata, aes(x = Age, y = ASFR, color=Year)) + geom_line(aes(group=Year)) + ggtitle("US ASFR 1933-2014")

Finally, we'll take a look at the Total Fertility Rate. TFR is the sum of the Age Specific Fertility Rates in a given year. This is a period measure, because it measures the fertility of an imaginary woman who experiences the average number of births in each age group in a single year. Of course, no woman will be all ages in a single year. This measure is meant instead to give a snapshot of what fertility looks like in that year.

TFRdata <- ASFRdata
TFRdata <- aggregate(x = TFRdata[4:12], list(Year=TFRdata$Year), FUN=sum)
TFRdata <- dplyr::select(TFRdata, Year, births, exposure, ASFR)
TFRdata <- dplyr::rename(TFRdata, TFR = ASFR)

Now let's graph the TFR from 1933-2014.

ggplot(TFRdata, aes(x = Year, y = TFR)) + geom_line() + ggtitle("US Total Fertility Rate 1933-2014")


elinw/lehmansociology documentation built on May 16, 2019, 3 a.m.