knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Install and load the package with the following code:
install.packages("devtools") library(devtools) install_github("kdmulligan/processFCC", build_vignettes = TRUE) library(processFCC)
First choose the month and year of the data set to download and process. It is not necessary to set these as variable but they are used multiple times so it is slightly more convenient.
month = "Dec" year = 2018
download_FCC()
will download the specified year and month of the data
as a zip file and then unzip its inner file to your working directory
with the naming convention "FCC_fixed_brdbd_month_year.csv". The speed
of this will depend on your internet connection. Additionally, zip files
were not available for the 2020 data sets so those will take much longer
to download because they are downloading as a csv.
system.time(download_FCC(year, month))
Next use the get_colname
function to get the proper column names for
your data set. If you save them as use_colnam you will not have to
specify the function later when making the SQL database
use_colnam <- get_colname(year, month)
Then open a connection to SQLite. The data base can be named something else if desired.
con <- dbConnect(SQLite(), dbname = "fcc.sqlite")
Next we will read the csv into the SQLite database using
csv_to_sql_db
. This function uses readr
chunked functionality of
read_delim_chunked
to read in the chunk_size number of rows to the
database at a time. The callback function appends the new rows to the
SQL con database. This work is based off of the inborutils
R package
on github.
In the download_FCC
function the raw FCC files are saved with a
standardized naming convention as shown below, making it easy to find
the file to be processed.
It is vital you use the specified column names or rollup_FCC
will not
recognize the column names in the database. Thus use the get_colname
function as shown above. After calling csv_to_sql_db
, your raw FCC
data will exist in a SQLite database at the connection con.
I recommend keeping the progress bar on so it is clear how long it will take to put the file into the database.
name_csv <- paste0("FCC_fixed_brdbd_", month, "_", year, ".csv") system.time(csv_to_sql_db(csv_file = name_csv, con = con, pre_process_size = 1000, chunk_size = 50000, show_progress_bar = TRUE, db_colnam = use_colnam))
The processed file from rollup_FCC
will be outputted into the working
directory. This file also has a standardized naming convention so it is
easy to find: "processed_fcc_monthyear.csv". In this call of
rollup_FCC
, the June 2018 FCC data is rolled up to the Census Tract
level excluding technology codes 0, 60, and 70 and looks at 5 threshold
combinations for download/upload speeds: 25/3, 25/5, 50/5, 75/10, and
100/100 Mbps. The processed data will count the number of providers
within the census tract providing broadband at the thresholds.
rollup_fcc(con, year = 2018, month = "June", geogr = "ct", tech_exc = c("0", "60", "70"), thresh_down = c(25, 25, 50, 75, 100), thresh_up = c(3, 5, 5, 10, 100))
Be sure to close the connection to the SQLite database. You may also want to remove the sqlite, zip, and raw csv files from your working directory because they are considerably large and take up a lot of space.
dbDisconnect(con) file.remove("fcc.sqlite") file.remove(name.csv) name_zip <- paste0("FCC_fixed_brdbd_", month, "_", year, ".zip") file.remove(name_zip)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.