knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(ParseIRS990)

This vignette gives an overview of the R package ParseIRS990

Introduction

Registered U.S. non-profit organizations are required to report yearly to the Internal Revenue Service to demonstrate continued compliance and maintain their tax-free status. Depending on the particular category of non-profit and organization is in and depending on their total revenue they may be required to file a 990 form. This filing details various aspects of the organization's finances as well as their activities.

These 990 filings are a valuable source of information for researchers interested in civic life in America. Since 2012 990 filings have been digitized and made available to the public. However, accessing information from these filings can be cumbersome and require a great deal of domain expertise. The aim of this package is to simplify the work of accessing 990 data for researchers.

Get the AWS url for the filing of interest

Digitized 990 data are available as XML files in an AWS repository. These files are organized by yearly index file. The year, in this case, refers to the year that the IRS processed the file. All of the files in the 2018 index were processed in 2018. The tax filing itself can be from any previous date. This makes it difficult to locate a specific filing for a given organization and given year.

To remedy this, we have rebuilt a consolidated index file for this package. This index, included in the package, contains the additional fields TaxPeriodBeginDt, TaxPeriodEndDt, and (importantly) Tax_Year. Tax_Year refers to the tax year of the 990 form the organization submitted. (Whereas IRS_Year refers to the year the IRS processed the submission into digitized form.) Note that organizations can chose to adhere either to a calendar year or to a fiscal year of their selection. TaxPeriodBeginDt and TaxPeriodEndDt refer to the beginning and end dates of that filing's year.

The location of a given digitized filing can be found with the get_aws_url function. This returns the location of the XML file in the AWS repository.

## this organization's 2018 990 filing can be found here
aws_url <- get_aws_url("061553389",2018)

Get the XML file for a given AWS url

To parse fields from filings, first load an XML file of interest. Note that this function calls get_aws_url so there is no need to find the URL location yourself. get_990 will be the first step for most uses of this package.

## load an XML for this organization's 2018 filing.
xml_root <- get_990("221405099", year=2018)

## see name of this organization
organization_name <- get_organization_name_990("221405099")

Get a specific field from a filing

Non-profit organizations can file different versions of the 990 Form, depending on their specific status. Knowledge of the specific form type is not required to extract values with this package but you can see the form type of a given form with:

## see form type of a filing
filing_type <- get_filing_type_990(xml_root)

Available parsed fields can be seen in the suppled irs_fields table. The package_variable column gives the name of the variable to be used in functions. Other columns show both the XML path and the physical form location for that variable. Available variables are grouped by category and subcategory.

## see available variables
irs_fields

## get total revenue for this org
revenue_total <- get_single_value_990(xml_root, "revenue_total")

See related entities

Organizations report related entities on Form 990 Schedule R. The EINs of related organizations can be found with get_scheduleR.

## see related EINs for a given EIN and given tax year filing. 
related_eins <- get_scheduleR("061553389",2018)

This function returns multiple values if multiple related organizations are reported and will indicate if no related organizations were reported.

Extract mission and description text

Organizations report descriptive information about their primary mission and main activities in 990 filings. The concatenated responses to these fields can be extracted.

# mission statement
mission_desc <- get_value_990(xml_root, "mission_desc")

# program description
program_desc <- get_value_990(xml_root, "program_desc")


p3lab/ParseIRS990 documentation built on Aug. 14, 2021, 9:22 p.m.