clean_10_data: Read and Clean 10K and 10Q address data

Description Usage Arguments Details Value Author(s) Examples

Description

Read 10K or 10Q address file into a data frame. The data is taken from the parsed EDGAR list that the SEC provides.

Usage

1

Arguments

x

a text file in SEC's format

Details

3 extra columns are added to the original data frame. The latitude and longitude information is obtained from a zipcode data file that is loaded internally. Any line that doesn't have a valid business zip code in the 'zipba' field will get NA for its latitude and longitude. The last column, 'companies', is a count of all of the rows in the data frame that contain the same business zip code. This gives an idea of how popular an area is, and can be used as an argument to a plot command to indicate the size of a plot point. Companies are removed if they list a foreign address. Blank addresses are filled in. The main advantage to this, besides address data continuity, is in having the business address fields populated. Some analytical routines pull their data from the business address, so it's a good idea to have it fully populated. The business address fields checked are: 'bas1','bas2','cityba',.stprba','zipba' The mainling address fields checked are: 'mas1','mas2','cityma',.stprma','zipma' #' Blank addres fields are filled in based on

Value

A data frame of the original columns, plus 3 extra columns: Lat Latitude of the business zip code Long Longitude of the business zip code companies Count of companies in the same zip code

Author(s)

Nick Lukianoff

Examples

1
data_for_2016 <- clean_10_data("sub16.txt")

NickSEC/SECAddresses documentation built on May 7, 2019, 6:07 p.m.