Parsing Functions in edgarWebR
In edgarWebR: SEC Filings Access

knitr::opts_chunk$set(collapse = T, comment = "#>")
library(edgarWebR)
set.seed(0451)
# Cache http requests
library(httptest)
start_vignette("parsing")

New to edgarWebR 0.2.0 are functions for parsing SEC documents. While there are good R packages for XBRL processing, there is a gap in extracting information from other document types available via the site. edgarWebR currently provides functions for 2 of those -

parse_submission() - Processes a raw SGML filing into component documents. These are the 'Complete submission text file' on filing pages. Similar to zip files, they contain all the files included in particular submission.
parse_filing() - Processes a narrative filing (e.g. 10-K, 10-Q) into paragraphs annotated with part and item numbers. In a submission with many files, this is the main form.

This vignette will show how to use both functions to find the risks reported by in a company's recent filing.

Find a Submission

Using edgarWebR functions, we'll first look up a recent filing.

ticker <- "STX"

filings <- company_filings(ticker, type = "10-Q", count = 40)
# Specifying the type provides all forms that start with 10-, so we need to
# manually filter.
filings <- filings[filings$type == "10-Q", ]
# We're only interested in a particular filing
filing <- filings[filings$filing_date == "2017-10-27", ]
filing$md_href <- paste0("[Link](", filing$href, ")")
knitr::kable(filing[, c("type", "filing_date", "accession_number", "size",
                                "md_href")],
             col.names = c("Type", "Filing Date", "Accession No.", "Size", "Link"),
             digits = 2,
             format.args = list(big.mark = ","))

Get the Complete Submission File

We'll next get the list of files and find the link to the complete submission.

docs <- filing_documents(filing$href)
doc <- docs[docs$description == 'Complete submission text file', ]
doc$md_href <- paste0("[Link](", doc$href, ")")

knitr::kable(doc[, c("seq", "description", "document", "size",
                     "md_href")],
             col.names = c("Sequence", "Description", "Document",
                           "Size", "Link"),
             digits = 2,
             format.args = list(big.mark = ","))

Normally, we would use filing_documents() to get to the 10-Q directly, but as an example we'll be using the complete submission file to demonstrate the parse_submission() function. You would want to use the complete submission file if you want to access the full list of files - e.g. in this case there are 80 files in the submission, but only 10 available on the website and therefore available to filing_documents() - or if you worry about efficiency and are downloading all of the documents.

Parse the Complete Submission File

Now that we have the link to the complete submission file, we can parse it into components.

parsed_docs <- parse_submission(doc$href)
knitr::kable(head(parsed_docs[, c("SEQUENCE", "TYPE", "DESCRIPTION", "FILENAME")]),
             col.names = c("Sequence", "Type", "Description", "Document"),
             digits = 2,
             format.args = list(big.mark = ","))

And just for example, here's the end of the full list - note the excel that isn't on the SEC site for instance.

knitr::kable(tail(parsed_docs[, c("SEQUENCE", "TYPE", "DESCRIPTION", "FILENAME")]),
             col.names = c("Sequence", "Type", "Description", "Document"),
             digits = 2,
             format.args = list(big.mark = ","))

The 10-Q Filing document is Seq. 1, with the full text of the document in the TEXT column.

# NOTE: the filing document is not always #1, so it is a good idea to also look
# at the type & Description
filing_doc <- parsed_docs[parsed_docs$TYPE == '10-Q' &
                          parsed_docs$DESCRIPTION == '10-Q', 'TEXT']
substr(filing_doc, 1, 80)

We can see that contains the raw document. For document types which are not plain text, e.g. the XBRL zip file, the content is uuencoded and would been further processing.

Parse the Filing Document

Fortunately edgaWebR functions that take URL's will also take a string containing the document, so to parse the document, while we could have passed the URL to the online document we can just pass in the full string.

doc <- parse_filing(filing_doc, include.raw = TRUE)
unique(doc$part.name)
unique(doc$item.name)
head(doc[grepl("market risk", doc$item.name, ignore.case = TRUE), "text"], 3)
risks <- doc[grepl("market risk", doc$item.name, ignore.case = TRUE), "raw"]

Now the document is all ready for whatever further processing we want. As a quick example we'll pull out all the italicized risks.

risks <- risks[grep("<i>", risks)]
risks <- gsub("^.*<i>|</i>.*$", "", risks)
risks <- gsub("\n", " ", risks)
risks

This is a fairly simplistic example, but should serve as a good tutorial on processing filings.

How to Download

edgarWebR is available from CRAN, so can be simply installed via

install.packages("edgarWebR")

If you want the latest and greatest, you can get a copy of the development version from github by using devtools:

# install.packages("devtools")
devtools::install_github("mwaldstein/edgarWebR")

# Cleanup
end_vignette()

Any scripts or data that you put into this service are public.

edgarWebR documentation built on April 24, 2021, 5:09 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

edgarWebR
SEC Filings Access

Parsing Functions in edgarWebR
In edgarWebR: SEC Filings Access

Find a Submission

Get the Complete Submission File

Parse the Complete Submission File

Parse the Filing Document

How to Download

Try the edgarWebR package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

edgarWebR SEC Filings Access

Parsing Functions in edgarWebR In edgarWebR: SEC Filings Access

Find a Submission

Get the Complete Submission File

Parse the Complete Submission File

Parse the Filing Document

How to Download

Try the edgarWebR package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

edgarWebR
SEC Filings Access

Parsing Functions in edgarWebR
In edgarWebR: SEC Filings Access