knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(unemployedR)
library(dplyr)
library(usmap)

Introduction

In this package, we want to introduce an easy method in visualization the census data by geographic and time. We will use the USDA published unemployment and household income data as the original data source. Since it contains all the county, state and nation levels data from 2000 to 2022. And when visualization the data by geographic, we will the usmap::us_map() for each county's longitude and latitude.It totally including six packages clean, unemployment_rate, unemployment_rate_time, global, medianhhimcome and pop.

Here are links to our github repository and website.

About the data

We first read the CSV file from the Unemployment and median household income for the U.S., States, and counties, 2000-20 by using the read.csv().

file<-read.csv("https://www.ers.usda.gov/webdocs/DataFiles/48747/Unemployment.csv")
head(file)

The raw data contains all the census information in one column Attribute, the structure of the Attribute is Category_name_year.

Other than Attribute, we also need to restructure the state and Area_name column. Since in this database, we have 3 level, the county, state and national data.

head(unique(filter(file,State=="NJ")[2:3]))

By the example of New Jersey, We find out, all national information is collected in the State="US", Area_name="United States". The state level information is collected in State= abbreviation , Area_name= full spelled. The county data is collected in State=abbreviation, Area_name=County name, Abbrevation.

Functions

Clean data

dataclean

This function is a simple application of dplyr package in restructuring the raw data. First, since the year is always in form of 20xx, we can use dplyr::seperate(...,...,sep = -4) to separate the category and year. Then, we delete the duplicate Abbreviation in State and Area_name, and county/parish in the Area_name by gsub().

file=dataclean("https://www.ers.usda.gov/webdocs/DataFiles/48747/Unemployment.csv")
str(file)

Notation

The government does not collect all the attributes annually. Only Civilian_labor_force, Employed, Unemployed and Unemployment_rate have been fully collected. And for some year, such as 2005 and 2006, they didn't collect all counties' data for employment. This may result some problem when we visualize the data later.

Mapping the data

plotunemployed

This function is used to plot the county level umemployment rate of the selected state in one year

The required aesthetics are:

The example below shows the county level unemployment rate in New Jersey in 2018.

plotunemployed(file, 2018, "NJ")

We classify the unemployment rate into 5 levels: VeryLow (<2%), Low (2% ~ 4%), Medium (4% ~ 6%), High (6% ~ 8%) and VeryHigh (>8%) and fill the color as blue, light blue, grey, light red and red. There exist some missing values in some observations. We fill the color as dark grey.

plotmedianhouseholdincome

This function is used to plot the county level median household income of the selected state in 2019.

The required aesthetics are:

The example below shows the county level median household income in Mississipp in 2019.

plotmedianhouseholdincome(file,"MS")

We classify the median household income (in thousands) into 5 levels: VeryLow (<24), Low (24 ~ 40), Medium (40 ~ 55), High (55 ~ 70) and VeryHigh (70 ~ 150) and fill the color as red, light red, grey, light blue and blue.

plotunemployed_animation

This function is used to provide the animation plot of the county level unemployment rate of the selected state from 2000 to 2020.

The required aesthetics are:

The example below shows the animation plot of the county level unemployment rate in Iowa from 2000 to 2020.

plotunemployed_animation(file, "IA")

This function has the same levels as the function plotunemployed.

Missing values and other issues

There are several data issues:

To solve this problem, we add these counties in the raw data, set the missing URs as NA and fill the counties as black color.

The example below shows the county level unemployment rate in LA in 2005.

plotunemployed(file, 2005, "LA")

There are some blank unemployment rate values in lower right corner in LA in 2005

Plotting the data

plotunemployed_time

This function is used to plot the unemployment rate of selected state along with years.

The required aesthetics are:

The example shows unemployment rate in IA along with years.

plotunemployed_time(file,"IA")

We use the state level data for this unemployment rate vs time plot. We also compared the state level unemployment rate with the whole nation, to figure out whether the chosen state has a different trend compared with the whole nation.

stateunemployed

This function is used to plot top 10 unemployed county histogram in selected state and a year.

The required aesthetics are:

The example shows top 10 unemployed counties in IA in 2011 histogram.

stateunemployed(file,2011,"IA")

We rank the top 10 counties in Iowa state. The histogram represent the unemployment population in each county, and later in shiny app, for each county we calculate the weight of this population in whole state.



zzhou93/unemployedR documentation built on May 14, 2022, 4:54 p.m.