Home

/

GitHub

/

emilyborsting/nsqiputils

/

README.md

README.md
In emilyborsting/nsqiputils: Helper functions to work with ACS NSQIP data

nsqip utils

This project contains utility functions for working with the ACS NSQIP dataset using R.

N.B. American College of Surgeons National Surgical Quality Improvement Program (ACS NSQIP®) data is not included in this repository and must be acquired independently.

This package depends on the following packages.

install.packages("plyr", repos="http://cran.r-project.org")
install.packages("RSQLite", repos="http://cran.r-project.org")

ACS NSQIP® provides a set of yearly, self-extracting TXT, SPSS or SAS files. This project uses TXT. The upfront effort to use a database is worth the time investment. The following assumes familiarity with unix-like operating systems and standard gnu-utils.

Put the self-extracting .exe files in a subfolder, data. Use the commandline tool unarchiver unar to unarchive the executable.

cd ./data
find ./ -name \*.exe -exec unar -f {} \;

The extracted TXT PUF files are tab delimited and separated by year. To do cross-year analysis we need to merge the dataset. To merge, ensure that each PUF's column headers are of the same case. You can make the headers all uppercase with vim by typing gUU then saving :wq.

From there, attempt the merge.

data <- NSQIPfromDir() # Merge all TXT data, unfiltered
write.csv(data, "./data/nsip.csv", row.names=FALSE)

Merging all datasets can take over an hour. Merging by filtered CPT code takes closer to 10 minutes. Fortunately, there are faster ways to query and filter data.

Save the previously merged data as a CSV file. Remove "NULL" and -99 as NULL values in CSV are simply represented by an empty field.

# replace "NULL" and -99; save the new file as nsqip-cleaned.csv
sed -e 's/\"NULL\"//g' -e 's/-99//g' nsqip.csv > nsqip-cleaned.csv
# (optionally) compress the new CSV file with gzip
gzip nsqip-cleaned.csv

The merged, cleaned, and compressed nsqip-cleaned.csv is ~500MB. The combined CSV is still consumable by R but filtering is still slow.

For faster querying, use a database.

Create a database with the included schema that includes indexes on commonly queried fields like CPT.

cat schema.sql | sqlite3 data/nsqip.db

Open the database, then import the cleaned and merged CSV file.

sqlite3 data/nsqip.db 
sqlite> .mode csv NSQIP
sqlite> .import data/nsqip-cleaned.csv NSQIP

The database file is ~3.8 GB. Querying is extremely fast; filtering by CPT code has gone from 10 minutes to less than a tenth of a second.

See nsqip.R for usage.

Emily A. Borsting

emilyborsting/nsqiputils documentation built on May 16, 2019, 5:09 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

emilyborsting/nsqiputils
Helper functions to work with ACS NSQIP data

README.md
In emilyborsting/nsqiputils: Helper functions to work with ACS NSQIP data

nsqip utils

Installation

Data

Extract

Merge (tab delimited text)

Clean (csv)

Step 3 : Query (sql)

Usage

Author

R Package Documentation

Browse R Packages

We want your feedback!

emilyborsting/nsqiputils Helper functions to work with ACS NSQIP data

README.md In emilyborsting/nsqiputils: Helper functions to work with ACS NSQIP data

nsqip utils

Installation

Data

Extract

Merge (tab delimited text)

Clean (csv)

Step 3 : Query (sql)

Usage

Author

R Package Documentation

Browse R Packages

We want your feedback!

emilyborsting/nsqiputils
Helper functions to work with ACS NSQIP data

README.md
In emilyborsting/nsqiputils: Helper functions to work with ACS NSQIP data