read.SAScii: Create an R data frame by reading in an ASCII file and SAS...

Description Usage Arguments Details Value Note Author(s) Examples

View source: R/read.SAScii.R

Description

Using importation code designed for SAS users to read ASCII files into sas7bdat files, the read.SAScii function parses through the INPUT block of a (.sas) syntax file to design the parameters needed for a read.fwf function call, and then runs that command. This allows the user to specify the location of the ASCII (often a .dat) file and the location of the .sas syntax file, and then load the data frame directly into R in just one step.

Usage

1
read.SAScii( fn, sas_ri, beginline = 1, buffersize = 50, zipped = F , n = -1 , intervals.to.print = 1000 , lrecl = NULL , skip.decimal.division = NULL )

Arguments

fn

Character string containing location of ASCII filename (or if zipped = T, a filename ending in .zip).

sas_ri

Character string containing location of SAS import instructions.

beginline

Line number in SAS import instructions where the INPUT statement begins. If the word INPUT appears before the actual INPUT block, the function will return an error.

buffersize

Maximum number of lines to read at one time, passed to read.fwf().

zipped

Flag noting if ASCII file should be unzipped / decompressed before loading. Useful when downloading larger data sets directly from a website.

n

the maximum number of records (lines) to be passed to read.fwf(), defaulting to no limit.

intervals.to.print

the number of records to wait before printing current progress to the screen.

lrecl

LRECL option from SAS code. Only necessary if the width of the ASCII file is longer than the actual columns containing data (if the file contains empty space on the right side)

skip.decimal.division

whether numeric columns should be divided based on how many decimal places are specified by the SAS import instructions.

recommended: ignore this parameter (or set it to NULL) to let the function attempt to determine whether numeric columns have already been divided to hit the appropriate number of decimal places or not.

TRUE tells read.SAScii to not perform any decimal-related division of numeric columns.

FALSE tells read.SAScii to perform decimal-related division according to the SAS import instructions, ignoring the presence of numeric fields that already contain decimals.

Details

This function cannot handle overlapping columns. For example, in the 2009 National Ambulatory Medical Care Survey (NAMCS) SAS import instructions, columns DIAG1 and DIAG13D will create an error because both start at space 55.
ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/dataset_documentation/namcs/sas/nam09inp.txt.

Value

A data.frame as produced by read.fwf() which is called internally.

Note

Some of the commands below take days to run, depending on your machine. If you need the Survey of Income and Program Participation, start the program before you quit working for the weekend.

Author(s)

Anthony Joseph Damico

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
###########
#Some Data#
###########

#write an example ASCII data set
some.data <- "0154hello2304coolgreatZZ\n2034puppy0023nicesweetok\n9900buddy4495    swell!!"

#create temporary ASCII file
some.data.tf <- tempfile()
#write the sas code above to that temporary file
writeLines ( some.data , con = some.data.tf )

#write an example SAS import script using the at method
sas.import.with.at.signs <-
	"INPUT
		@1 NUMBERS1 4.2
		@5 WORDS1 $ 5.
		@10 NUMBERS2 2.0
		@12 NUMBERS3 2.0
		@14 WORDS2 $4.
		@18 WORDS3 $5
		@23 WORDS4 $ 1
		@24 WORDS5 $ 1
	;"
	

#create a temporary file
sas.import.with.at.signs.tf <- tempfile()
#write the sas code above to that temporary file
writeLines ( sas.import.with.at.signs , con = sas.import.with.at.signs.tf )

parse.SAScii( sas.import.with.at.signs.tf )

#using at signs sas script
read.SAScii( some.data.tf , sas.import.with.at.signs.tf )


#write an example SAS import script using the dash method
sas.import.with.lengths <-
	"INPUT
		NUMBERS1 1 - 4 .2
		WORDS1 $ 5-9
		NUMBERS2 10 -11
		NUMBERS3 12- 13 .0
		WORDS2 $14-17
		WORDS3$ 18-22
		WORDS4   $   23-23
		WORDS5 $24
	;"
	
#create a temporary file
sas.import.with.lengths.tf <- tempfile()
#write the sas code above to that temporary file
writeLines ( sas.import.with.lengths , con = sas.import.with.lengths.tf )

parse.SAScii( sas.import.with.lengths.tf )

#using dash method sas script
read.SAScii( some.data.tf , sas.import.with.lengths.tf )

## Not run: 


#########################################################################################
#Load the 2009 Medical Expenditure Panel Survey Emergency Room Visits file as an R data frame

#Location of the ASCII 2009 Medical Expenditure Panel Survey Emergency Room Visits File
MEPS.09.ER.visit.file.location <- 
	"http://meps.ahrq.gov/mepsweb/data_files/pufs/h126edat.exe"

#Location of the SAS import instructions for the
#2009 Medical Expenditure Panel Survey Emergency Room Visits File
MEPS.09.ER.visit.SAS.read.in.instructions <- 
	"http://meps.ahrq.gov/mepsweb/data_stats/download_data/pufs/h126e/h126esu.txt"

#Load the 2009 Medical Expenditure Panel Survey Emergency Room Visits File
#NOTE: The SAS INPUT command occurs at line 273.
MEPS.09.ER.visit.df <- 
	read.SAScii ( 
		MEPS.09.ER.visit.file.location , 
		MEPS.09.ER.visit.SAS.read.in.instructions , 
		zipped = T , 
		beginline = 273 )

#save the data frame now for instantaneous loading later
save( MEPS.09.ER.visit.df , file = "MEPS.09.ER.visit.data.rda" )


#########################################################################################
#Load the 2011 National Health Interview Survey Persons file as an R data frame

NHIS.11.personsx.SAS.read.in.instructions <- 
	"ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Program_Code/NHIS/2011/personsx.sas"
NHIS.11.personsx.file.location <- 
	"ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHIS/2011/personsx.zip"

#store the NHIS file as an R data frame!
NHIS.11.personsx.df <- 
	read.SAScii ( 
		NHIS.11.personsx.file.location , 
		NHIS.11.personsx.SAS.read.in.instructions , 
		zipped = T )

#or store the NHIS SAS import instructions for use in a 
#read.fwf function call outside of the read.SAScii function
NHIS.11.personsx.sas <- parse.SAScii( NHIS.11.personsx.SAS.read.in.instructions )

#save the data frame now for instantaneous loading later
save( NHIS.11.personsx.df , file = "NHIS.11.personsx.data.rda" )


#########################################################################################
#Load the 2011 National Health Interview Survey Sample Adult file as an R data frame

NHIS.11.samadult.SAS.read.in.instructions <- 
	"ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Program_Code/NHIS/2011/SAMADULT.sas"
NHIS.11.samadult.file.location <- 
	"ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NHIS/2011/samadult.zip"

#store the NHIS file as an R data frame!
NHIS.11.samadult.df <- 
	read.SAScii ( 
		NHIS.11.samadult.file.location , 
		NHIS.11.samadult.SAS.read.in.instructions , 
		zipped = T )

#or store the NHIS SAS import instructions for use in a 
#read.fwf function call outside of the read.SAScii function
NHIS.11.samadult.sas <- parse.SAScii( NHIS.11.samadult.SAS.read.in.instructions )

#save the data frame now for instantaneous loading later
save( NHIS.11.samadult.df , file = "NHIS.11.samadult.data.rda" )


#########################################################################################
#Load an IPUMS - American Community Survey Extract into R

#DOES NOT RUN without downloading ACS ASCII files to
#your local drive from http://www.ipums.org/

#MINNESOTA POPULATION CENTER - IPUMS ASCII EXTRACTS & SAS import instructions
IPUMS.file.location <- "./IPUMS/usa_00001.dat"
IPUMS.SAS.read.in.instructions <- "./IPUMS/usa_00001.sas"

#store the IPUMS extract as an R data frame!
IPUMS.df <- 
	read.SAScii ( 
		IPUMS.file.location , 
		IPUMS.SAS.read.in.instructions , 
		zipped = F )

#or store the IPUMS extract SAS import instructions for use in a 
#read.fwf function call outside of the read.SAScii function
IPUMS.sas <- parse.SAScii( IPUMS.SAS.read.in.instructions )


#########################################################################################
#Load the Current Population Survey - 
#Annual Social and Economic Supplement - March 2011 as an R data frame

#census.gov website containing the current population survey's main file
CPS.ASEC.mar11.file.location <- 
	"http://smpbff2.dsd.census.gov/pub/cps/march/asec2011_pubuse.zip"
CPS.ASEC.mar11.SAS.read.in.instructions <- 
	"http://www.nber.org/data/progs/cps/cpsmar11.sas"

#create a temporary file and a temporary directory..
tf <- tempfile() ; td <- tempdir()
#download the CPS repwgts zipped file
download.file( CPS.ASEC.mar11.file.location , tf , mode = "wb" )
#unzip the file's contents and store the file name within the temporary directory
fn <- unzip( tf , exdir = td , overwrite = T )

#the CPS March Supplement ASCII/FWF contains household-, family-, and person-level records.
#throw out records that are not person-level.
#according to the SAS import instructions, person-level record lines begin with a "3"

#create a second temporary file
tf.sub <- tempfile()

input <- fn
output <- tf.sub

incon <- file(input, "r") 
outcon <- file(output, "w") 

#cycle through every line in the downloaded CPS file..
while(length(line <- readLines(incon, 1))>0){
	#and if the first letter is a 3, add it to the new person-only CPS file.
	if ( substr( line , 1 , 1 ) == "3" ){
		writeLines(line,outcon)
	}
}
close(outcon)
close(incon , add = T)

#the SAS file produced by the National Bureau of Economic Research (NBER)
#begins the person-level INPUT after line 1209, 
#so skip SAS import instruction lines before that.
#NOTE that the beginline of 1209 will change for different years.

#store the CPS ASEC March 2011 file as an R data frame!
cps.asec.mar11.df <- 
	read.SAScii ( 
		tf.sub , 
		CPS.ASEC.mar11.SAS.read.in.instructions , 
		beginline = 1209 , 
		zipped = F )

#or store the CPS ASEC March 2011 SAS import instructions for use in a 
#read.fwf function call outside of the read.SAScii function
cps.asec.mar11.sas <- 
	parse.SAScii( CPS.ASEC.mar11.SAS.read.in.instructions , beginline = 1209 )


#########################################################################################
#Load the Replicate Weights file of the Current Population Survey 
#March 2011 as an R data frame

#census.gov website containing the current population survey's replicate weights file
CPS.replicate.weight.file.location <- 
	"http://smpbff2.dsd.census.gov/pub/cps/march/CPS_ASEC_ASCII_REPWGT_2011.zip"
CPS.replicate.weight.SAS.read.in.instructions <- 
	"http://smpbff2.dsd.census.gov/pub/cps/march/CPS_ASEC_ASCII_REPWGT_2011.SAS"

#store the CPS repwgt file as an R data frame!
cps.repwgt.df <- 
	read.SAScii ( 
		CPS.replicate.weight.file.location , 
		CPS.replicate.weight.SAS.read.in.instructions , 
		zipped = T )

#or store the CPS repwgt SAS import instructions for use in a 
#read.fwf function call outside of the read.SAScii function
cps.repwgt.sas <- parse.SAScii( CPS.replicate.weight.SAS.read.in.instructions )

	
#########################################################################################
#Load the 2008 Survey of Income and Program Participation Wave 1 as an R data frame
SIPP.08w1.SAS.read.in.instructions <- 
	"http://smpbff2.dsd.census.gov/pub/sipp/2008/l08puw1.sas"
SIPP.08w1.file.location <- 
	"http://smpbff2.dsd.census.gov/pub/sipp/2008/l08puw1.zip"

#store the SIPP file as an R data frame

#note the text "INPUT" appears before the actual INPUT block of the SAS code
#so the parsing of the SAS instructions will fail without a beginline parameter specifying
#where the appropriate INPUT block occurs

SIPP.08w1.df <- 
	read.SAScii ( 
		SIPP.08w1.file.location , 
		SIPP.08w1.SAS.read.in.instructions , 
		beginline = 5 , 
		buffersize = 10 , 
		zipped = T )

#or store the SIPP SAS import instructions for use in a 
#read.fwf function call outside of the read.SAScii function
SIPP.08w1.sas <- parse.SAScii( SIPP.08w1.SAS.read.in.instructions , beginline = 5 )


#########################################################################################
#Load the Replicate Weights file of the 
#2008 Survey of Income and Program Participation Wave 1 as an R data frame
SIPP.repwgt.08w1.SAS.read.in.instructions <- 
	"http://smpbff2.dsd.census.gov/pub/sipp/2008/rw08wx.sas"
SIPP.repwgt.08w1.file.location <- 
	"http://smpbff2.dsd.census.gov/pub/sipp/2008/rw08w1.zip"

#store the SIPP file as an R data frame

#note the text "INPUT" appears before the actual INPUT block of the SAS code
#so the parsing of the SAS instructions will fail without a beginline parameter specifying
#where the appropriate INPUT block occurs

SIPP.repwgt.08w1.df <- 
	read.SAScii ( 
		SIPP.repwgt.08w1.file.location , 
		SIPP.repwgt.08w1.SAS.read.in.instructions , 
		beginline = 5 , 
		zipped = T )

#store the SIPP SAS import instructions for use in a 
#read.fwf function call outside of the read.SAScii function
SIPP.repwgt.08w1.sas <- 
	parse.SAScii( 
		SIPP.repwgt.08w1.SAS.read.in.instructions , 
		beginline = 5 )


#########################################################################################
#Load all twelve waves of the 2004 Survey of Income and Program Participation as R data frames
	
SIPP.04w1.SAS.read.in.instructions <- 
	"http://smpbff2.dsd.census.gov/pub/sipp/2004/l04puw1.sas"

#store the SIPP SAS import instructions for use in a 
#read.fwf function call outside of the read.SAScii function
SIPP.04w1.sas <- parse.SAScii( SIPP.04w1.SAS.read.in.instructions , beginline = 5 )

#note the text "INPUT" appears before the actual INPUT block of the SAS code
#so the parsing of the SAS instructions will fail without a beginline parameter specifying
#where the appropriate INPUT block occurs

#loop through all 12 waves of SIPP 2004
for ( i in 1:12 ){
	
	SIPP.04wX.file.location <- 
		paste( "http://smpbff2.dsd.census.gov/pub/sipp/2004/l04puw" , i , ".zip" , sep = "" )

	#name the data frame based on the current wave
	df.name <- paste( "SIPP.04w" , i , ".df" , sep = "" )
	
	#store the SIPP file as an R data frame!
	assign( 
	
		df.name , 
		
		read.SAScii ( 
			SIPP.04wX.file.location , 
			SIPP.04w1.SAS.read.in.instructions , 
			beginline = 5 , 
			buffersize = 5 , 
			zipped = T )
	)

}

## End(Not run)

ajdamico/SAScii documentation built on Feb. 6, 2020, 12:05 a.m.