knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Install and load the package with the following code:

install.packages("devtools")
library(devtools)
install_github("kdmulligan/processFCC", build_vignettes = TRUE)
library(processFCC)

First choose the month and year of the data set to download and process. It is not necessary to set these as variable but they are used multiple times so it is slightly more convenient.

month = "Dec"
year = 2018

download_FCC() will download the specified year and month of the data as a zip file and then unzip its inner file to your working directory with the naming convention "FCC_fixed_brdbd_month_year.csv". The speed of this will depend on your internet connection. Additionally, zip files were not available for the 2020 data sets so those will take much longer to download because they are downloading as a csv.

system.time(download_FCC(year, month))

Next use the get_colname function to get the proper column names for your data set. If you save them as use_colnam you will not have to specify the function later when making the SQL database

use_colnam <- get_colname(year, month)

Then open a connection to SQLite. The data base can be named something else if desired.

con <- dbConnect(SQLite(), dbname = "fcc.sqlite")

Next we will read the csv into the SQLite database using csv_to_sql_db. This function uses readr chunked functionality of read_delim_chunked to read in the chunk_size number of rows to the database at a time. The callback function appends the new rows to the SQL con database. This work is based off of the inborutils R package on github.

In the download_FCC function the raw FCC files are saved with a standardized naming convention as shown below, making it easy to find the file to be processed.

It is vital you use the specified column names or rollup_FCC will not recognize the column names in the database. Thus use the get_colname function as shown above. After calling csv_to_sql_db, your raw FCC data will exist in a SQLite database at the connection con.

I recommend keeping the progress bar on so it is clear how long it will take to put the file into the database.

name_csv <- paste0("FCC_fixed_brdbd_", month, "_", year, ".csv")
system.time(csv_to_sql_db(csv_file = name_csv, con = con, 
                          pre_process_size = 1000, chunk_size = 50000,
                          show_progress_bar = TRUE, db_colnam = use_colnam))

The processed file from rollup_FCC will be outputted into the working directory. This file also has a standardized naming convention so it is easy to find: "processed_fcc_monthyear.csv". In this call of rollup_FCC, the June 2018 FCC data is rolled up to the Census Tract level excluding technology codes 0, 60, and 70 and looks at 5 threshold combinations for download/upload speeds: 25/3, 25/5, 50/5, 75/10, and 100/100 Mbps. The processed data will count the number of providers within the census tract providing broadband at the thresholds.

rollup_fcc(con, year = 2018, month = "June", geogr = "ct",
            tech_exc = c("0", "60", "70"), thresh_down = c(25, 25, 50, 75, 100),
            thresh_up = c(3, 5, 5, 10, 100))

Be sure to close the connection to the SQLite database. You may also want to remove the sqlite, zip, and raw csv files from your working directory because they are considerably large and take up a lot of space.

dbDisconnect(con)

file.remove("fcc.sqlite")
file.remove(name.csv)
name_zip <- paste0("FCC_fixed_brdbd_", month, "_", year, ".zip")
file.remove(name_zip)


kdmulligan/processFCC documentation built on Oct. 30, 2024, 7:43 p.m.