csvkit.fwf2csv: Convenience function to create a csv file from a fixed-width...

Description Usage Arguments Details Author(s) References See Also Examples

Description

This is purely a convenience function to use the start and width definitions from a dictionary file to convert a fixed-width file to a csv file using in2csv from csvkit using a system call.

Usage

1
  csvkit.fwf2csv(datafile, schema, output)

Arguments

datafile

The name of the flat data file (optionally including the path if the file is not in the working directory).

schema

The name of the schema file (perhaps generated using dct.parser and csvkit.schema) that defines the variable names, start positions, and column widths (can optionally include the file path if the file is not in the working directory).

output

The desired name of the output file.

Details

This function essentially makes a system call to in2csv from csvkit and instantly returns to the R prompt while the processing continues in the background. For small files, the conversion happens very quickly. For larger files, you can expect to wait a while.

The csv file might be considerably larger than the flat-file, particularly if the dictionary file defines overlapping columns, as some files do. You can verify the entire file was written by checking the number of lines in the file (perhaps using another system call to wc, for example system("wc -l path/to/flat-file"); system("wc -l path/to/csv")). The csv file should have one file more than the data file since it would include a line of headers.

Author(s)

Ananda Mahto

References

csvkit's in2csv documentation: https://csvkit.readthedocs.org/en/latest/scripts/in2csv.html

See Also

csvkit.schema

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
## Read an example dictionary file
data(sampleDctData)
## Write the dictionary to a dictionary file
## Write the data to a data file
currentdir <- getwd()
setwd(tempdir())
list.files(pattern=".dat|.dct|.csv")
writeLines(data66_dct, "data66.dct")
writeLines(data66_dat, "data66.dat")
## Is everything there in the dictionary file?
dct.parser("data66.dct", preview = TRUE)
# Missing the storage type, so remove that from includes
data66_dict <- dct.parser("data66.dct",
                         includes = c("StartPos", "ColName",
                                      "ColWidth", "VarLabel"))
list.files(pattern=".dat|.dct|.csv")
csvkit.schema(data66_dict)
list.files(pattern=".dat|.dct|.csv")
csvkit.fwf2csv(datafile = "data66.dat",
              schema   = "data66.dct.csv",
              output   = "data66-FINAL.csv")
Sys.sleep(10)
list.files(pattern=".dat|.dct|.csv")
read.csv("data66-FINAL.csv", nrows = 5)
setwd(currentdir)

mrdwab/StataDCTutils documentation built on May 23, 2019, 7:15 a.m.