knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
THIS IS A WORK IN PROGRESS. You are welcome to use and/or adapt the code, but I'M STILL WORKING ON THIS AND IT IS NOT MEANT FOR PUBLIC USE AS IS.
I'm creating this package to scrape public meeting agendas and minutes from the Framingham, MA government website, which was creating using the CivicPlus platform. More specifically, I'm interested in the meeting portal, and ways to 1) download meeting agendas and minutes, and 2) turn the PDfs into searchable text.
Thanks to the rvest and pdftools packages, this is possible!
The URL structures used for functions in this package are probably somewhat Framingham-specific. However, you could probably tweak the functions to make them work for another CivicPlus meeting portal.
This package is not on CRAN, but you can install it with
remotes::install_github("smach/CivicPlusScraper")
Get list of public meeting links using the get_list_of_meeting_links()
function.
library(CivicPlusScraper) meeting_page_links <- get_list_of_meeting_links()
For each of those public meeting pages, download any available agenda and/or minutes.
my_data_directory <- "D:/Sharon/My Documents Data Drive/FraminghamMeetings" all_my_files <- purrr::map_df(meeting_page_links, download_data_from_meeting_page, the_dir = my_data_directory)
Get number of pages for each downloaded file, then convert each file to text. Do one version with full text and one with a maximum page numbers.
library(pdftools) NumPages <- purrr::map_int(all_my_files$File, get_pages) all_my_files$NumPages <- NumPages # If you've done this before and saved earlier links, load them my_data_file <- paste0(my_data_directory, "/framingham_meetings.Rds") historical_files <- readRDS(my_data_file) all_my_files <- dplyr::bind_rows(all_my_files, historical_files) %>% unique() saveRDS(all_my_files, "D:/Sharon/My Documents Data Drive/FraminghamMeetings/framingham_meetings.Rds") # Now get full text text_vec_12 <- purrr::map_chr(all_my_files$File, get_text_from_pdf, max_pages = 12) names(text_vec_12) <- all_my_files$File text_vec_full <- purrr::map_chr(all_my_files$File, get_text_from_pdf) names(text_vec_full) <- all_my_files$File all_my_files_fulltext <- all_my_files all_my_files_fulltext$Text <- text_vec_full all_my_files_12pages <- all_my_files all_my_files_12pages$Text <- text_vec_12 library(data.table) setDT(all_my_files_fulltext) setDT(all_my_files_12pages) save(all_my_files_fulltext, all_my_files_12pages, file = paste0(my_data_directory, "/framingham_meetings_with_text.Rdata"))
You can then search for a term in various ways, such as
saxonville_or_nobscot <- all_my_files_fulltext[Text %like% "Saxonville|Nobscot"]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.