parse_text_filing: Parse Text Filing

Description Usage Arguments Details Value Examples

View source: R/parse_filing.R

Description

Given a link to a filing document (e.g. the 10-K, 8-K) in TXT, process the file into parts and items. This enables follow-up processing of a desired section - e.g. just the Risk Factors. 'item.name' and 'part.name' are taken directly from the document without any attempt to normalize.

Usage

1
2
parse_text_filing(x, strip = TRUE, include.raw = FALSE,
  fix.errors = TRUE)

Arguments

x

- URL to a filing text document or actual text

strip

- Should non-text elements be removed? Default: true

include.raw

- Include unprocessed nodes in result? Default: false

fix.errors

- Try to fix document errors (e.g. missing part labels). WIP. Default: true

Details

NOTE: This has been tested on a range of documents, but formatting differences could cause failures. Please report an issue for any document that isn't parsed correctly.

FURTHER NOTE: Not all filings are well formed - missing headings, bad spacing, etc. These can all throw the parsing off!

Value

a dataframe with one row per paragraph

part.name

Detected name of the Part

item.name

Detected name of the Item

text

Text of the paragraph / node

raw*

Raw HTML of the node if include.raw = TRUE

Examples

1
2
3
head(parse_text_filing(
  "https://www.sec.gov/Archives/edgar/data/37996/000003799602000015/v7.txt"
))

mwaldstein/edgarWebR documentation built on Aug. 25, 2018, 9:22 p.m.