parse_filing: Parse Filing
In edgarWebR: SEC Filings Access

Description Usage Arguments Details Value Examples

Given a link to filing document (e.g. the 10-K, 8-K) in HTML, process the file into parts and items. This enables follow-up processing of a desired section - e.g. just the Risk Factors. 'item.name' and 'part.name' are taken directly from the document without any attempt to normalize.

1	parse_filing(x, strip = TRUE, include.raw = FALSE, fix.errors = TRUE)

`x`	- URL to a filing HTML document, html text or xml_document
`strip`	- Should non-text elements be removed? Default: true
`include.raw`	- Include unprocessed nodes in result? Default: false
`fix.errors`	- Try to fix document errors (e.g. missing part labels). WIP. Default: true

NOTE: This has been tested on a range of documents, but formatting differences could cause failures. Please report an issue for any document that isn't parsed correctly.

FURTHER NOTE: Not all filings are well formed - missing headings, bad spacing, etc. These can all throw the parsing off!

a dataframe with one row per paragraph