knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

CivicPlusScraper

THIS IS A WORK IN PROGRESS. You are welcome to use and/or adapt the code, but I'M STILL WORKING ON THIS AND IT IS NOT MEANT FOR PUBLIC USE AS IS.

I'm creating this package to scrape public meeting agendas and minutes from the Framingham, MA government website, which was creating using the CivicPlus platform. More specifically, I'm interested in the meeting portal, and ways to 1) download meeting agendas and minutes, and 2) turn the PDfs into searchable text.

Thanks to the rvest and pdftools packages, this is possible!

The URL structures used for functions in this package are probably somewhat Framingham-specific. However, you could probably tweak the functions to make them work for another CivicPlus meeting portal.

Installation

This package is not on CRAN, but you can install it with

remotes::install_github("smach/CivicPlusScraper")

Downloading agendas and minutes from a CivicPlus government website

Get list of public meeting links using the get_list_of_meeting_links() function.

library(CivicPlusScraper)
meeting_page_links <- get_list_of_meeting_links()

For each of those public meeting pages, download any available agenda and/or minutes.

my_data_directory <- "D:/Sharon/My Documents Data Drive/FraminghamMeetings"

all_my_files <- purrr::map_df(meeting_page_links, download_data_from_meeting_page, the_dir = my_data_directory)

Get number of pages for each downloaded file, then convert each file to text. Do one version with full text and one with a maximum page numbers.

library(pdftools)
NumPages <- purrr::map_int(all_my_files$File, get_pages)
all_my_files$NumPages <- NumPages



# If you've done this before and saved earlier links, load them
my_data_file <- paste0(my_data_directory, "/framingham_meetings.Rds")
historical_files <- readRDS(my_data_file)
all_my_files <- dplyr::bind_rows(all_my_files, historical_files) %>%
  unique()
saveRDS(all_my_files, "D:/Sharon/My Documents Data Drive/FraminghamMeetings/framingham_meetings.Rds")

# Now get full text
text_vec_12 <- purrr::map_chr(all_my_files$File, get_text_from_pdf, max_pages = 12)
names(text_vec_12) <- all_my_files$File

text_vec_full <- purrr::map_chr(all_my_files$File, get_text_from_pdf)
names(text_vec_full) <- all_my_files$File

all_my_files_fulltext <- all_my_files
all_my_files_fulltext$Text <- text_vec_full

all_my_files_12pages <- all_my_files
all_my_files_12pages$Text <- text_vec_12

library(data.table)
setDT(all_my_files_fulltext)
setDT(all_my_files_12pages)

save(all_my_files_fulltext, all_my_files_12pages, file = paste0(my_data_directory, "/framingham_meetings_with_text.Rdata"))

You can then search for a term in various ways, such as

saxonville_or_nobscot <- all_my_files_fulltext[Text %like% "Saxonville|Nobscot"]


smach/CivicPlusScraper documentation built on Dec. 23, 2021, 3:25 a.m.