importSAS | R Documentation |
Selective import of sas7bdat files into R data.table format
importSAS(filename,wd=NULL,keep=NULL,drop = NULL,
where = NULL,obs = NULL,filter = NULL,
filter.by = NULL, filter.negative = FALSE,
set.hook=NULL,step.hook=NULL,pre.hook=NULL,
post.hook=NULL,savefile = NULL,overwrite = TRUE,
show.sas.code=FALSE,save.tmp = FALSE,content=FALSE,
na.strings="dot",date.vars=NULL,datetime.vars=NULL,
character.vars="pnr", numeric.vars = NULL,sas.program,
sas.switches, sas.runner, use.colClasses=TRUE,
skip.date.conversion=FALSE, force.numeric=TRUE,
sas.data.extension="sas7bdat", verbose=FALSE,...)
contentSAS(filename,wd=NULL)
filename |
The filename (with full path) of the SAS dataset to import.
So, |
wd |
The directory used to store temporarily created files (SAS script, log file, csv file). You need to
have permission to write to this directory. The default value is the current working directory, see |
keep |
A vector of variable names, i.e., the variables (columns) to read and keep from the dataset. Default is to read and keep all variables. |
drop |
Specifies the variables (columns) to leave out from the dataset. Default is to leave out no variables. |
where |
Specifies which conditions the observations (rows) from the dataset should fulfil. Default is no conditions. Use SAS syntax (see examples). |
obs |
Number of observations to read from the dataset. Setting this to |
filter |
Alternative or in addition to the where statement it is
possible to filter the rows of |
filter.by |
Vector of variable names to filter by. By default all variables present in the filter data are used. |
filter.negative |
Vector of length two that defines how to merge the
|
set.hook |
Quoted SAS statments (within use single quotes) to be placed in addition to set options (where, keep, drop, obs) when setting the data set |
step.hook |
Quoted SAS statments (within use single quotes) to be placed after setting the data set |
pre.hook |
Quoted SAS code (within use single quotes) to be set in the beginning of the SAS program. For example, it maybe useful to specify options such as |
post.hook |
Quoted SAS code (within use single quotes) to be set at the end of the SAS program. For example, it maybe useful to specify 'proc print data=df;' in case of trouble. |
savefile |
If specified, the generated csv file will be saved with the given name. The name should end with ".csv". The file will be saved in the working directory or in the directory given under |
overwrite |
Logical. Determines whether or not to overwrite files already existing with the same name as files generated by this function. This is the temporary SAS file, log file and csv file, and possibly a permanent file with the name given under "savefile". If the value is FALSE and some files already exist, the function will abort and print the name of the problematic files. The default value is TRUE. |
show.sas.code |
Logical. If |
save.tmp |
Logical. Option to save all temporary files. Even though this is set to FALSE, the csv file will be saved if there is given a filename in "savefile". Default value is FALSE. |
content |
Logical. If true, the function will only read and return the content of the import file. Together with save.tmp=TRUE, this can be used to generate the SAS file without running it. |
na.strings |
A vector of strings to interpret values of character variables as NA. Each element should be a regular expression
that can be understood by |
date.vars |
Vector of variables to be converted to date variables. For these variables a SAS format statement |
datetime.vars |
Vector of variables to be converted to datetime variables. For these variables a SAS format statement |
character.vars |
names of variables that should be converted to character. Case does not matter. Default is |
numeric.vars |
character.vars names of variables that should be converted to numeric. case does not matter. |
sas.program |
sas program. On linux where |
sas.switches |
On linux this defaults to "" on any other system to |
sas.runner |
How sas is invoked. On linux this defaults to |
use.colClasses |
Logical. If TRUE learn about the variable types from SAS's proc contents and
pass to |
skip.date.conversion |
if TRUE do not try to convert any dates or datetime variables. If |
force.numeric |
if TRUE force numeric format on numeric variables specified by argument |
sas.data.extension |
String to be checked against the file extenstion of filename. Default is |
verbose |
Logical. When |
... |
Arguments passed to |
This function first runs SAS proc contents in the background and uses the result to write temporary SAS code that writes the data in csv format to a temporary file. The data are then read with data.table::freads into R. The function tries to help with the formatting of numeric and date variables and with the understanding of missing values. However, in some cases this help is counterproductive and can therefore be turned off, to some extent at least, with arguments na.strings, force.numeric, skip.date.conversion. Also, the user selects which columns and rows to import. See examples.
As R
is case-sensitive while SAS
is not, to avoid confusion all variable names are converted to lower case.
The output is a data.table with the columns requested in keep (or all columns) and the rows requested in where (or all rows) up to obs many rows.
Anders Munch a.munch@sund.ku.dk and Thomas A Gerds tag@biostat.ku.dk
This function is based on pioneering work by Jesper Lindhardsen.
# We first set a working directory in which we have read and write permission
# These functions will produce temporary files which, if save.tmp is not set to TRUE, will
# be removed afterwards.
## Not run:
setwd("v:/Data/Workdata/704791/AndersMunch/readSAS/R")
# Before importing a SAS data file it is useful to look into the
# format of the variables:
contentSAS(filename="X:/Data/Rawdata_Hurtig/704791/diag_indl")
contentSAS(filename="x:/Data/Rawdata_Hurtig/704791/pop.sas7bdat")
# Also, it is often a good idea to initially only read a limited amount of data
df101 <- importSAS(filename="X:/Data/Rawdata_Hurtig/704791/diag_indl",obs=101,save.tmp=TRUE)
# and to examine the result
str(df101)
df101
# Format, dates, numeric, character
df101 <- importSAS(filename="X:/Data/Rawdata_Hurtig/704791/diag_indl",obs=101,
save.tmp=TRUE,date.vars="inddto",
numeric.vars="pnr",character.vars="packsize")
# we can also use the pre.hook to limit the number of observations via sas options:
importSAS(filename="X:/Data/Rawdata_Hurtig/704791/diag_indl",
pre.hook="options obs=17;",where="diag='DN899'",keep=c("PNR","diag"),show.sas.code=1L)
# NOTE: In combination with a where statement SAS will find the first 101 observations that
# satisfy the where statement. When the where statement finds nothing then SAS
# will run through the whole file without finding anything
# To import the whole file just remove the limitations:
# df <- importSAS(filename="X:/Data/Rawdata_Hurtig/704791/diag_indl",show.sas.code=TRUE)
# But: this is not useful when files are huge and you should to look
# at the examples below which show how to read selected columns and rows only
# In huge data it is efficient to select only the required columns
# and to read only the rows of the data set that meet a criterion.
# The following example shows how to select columns with the keep option
# and how to specify the "where statement" of the SAS data step:
df0 <- importSAS(filename="X:/Data/Rawdata_Hurtig/704791/diag_indl",
obs=101,
keep=c("diag","diagtype"),
where="diagtype = 'A' and diag = '45490'")
# Examine the imported data.table
str(df0)
df0
# using where contains is powerful
importSAS(filename="X:/Data/Rawdata_Hurtig/704791/diag_indl",
obs=10,
save.tmp=TRUE,
where="diag contains 'DI2'",
keep=c("PNR","diag","inddto","uddto"))
# Another typical task is to import only the rows which correspond to
# a set of pnr numbers or a set of diagnoses or both.
# To achieve this the function importSAS is merging files during the import.
# The feature is called 'filter' and illustrated in the following examples:
# Example 1: import only rows of diag_indl which correspond to one of the
# first 23 pnr's in pop
pop <- importSAS(filename="x:/Data/Rawdata_Hurtig/704791/pop.sas7bdat",
obs=23,keep="PNR")
df1 <- importSAS(filename="X:/Data/Rawdata_Hurtig/704791/diag_indl",
obs=8501,filter=pop)
str(df1)
df1
# Example 2: import only rows of diag_indl which correspond the
# first pnr's in pop for which also diag is equal to DK409
pop2 <- importSAS(filename="x:/Data/Rawdata_Hurtig/704791/pop.sas7bdat",
obs=4223,keep="PNR")
pop2 <- pop2[,diag:="DZ508"]
df2 <- importSAS(filename="X:/Data/Rawdata_Hurtig/704791/diag_indl",
obs=8501,keep=c("pnr","diag","inddto","uddto","pattype"),
where="pattype ne 3",filter=pop2)
str(df2)
df2
## Sometimes the sas data file cannot be read due to unknown formats
## here is how to solve this:
df2a <- importSAS(filename="X:/Data/Rawdata_Hurtig/704791/diag_indl",
obs=101,
pre.hook="options nofmterr;")
# The hooks set.hook and step.hook can be used as follows:
df3 <- importSAS(filename="X:/Data/Rawdata_Hurtig/704791/diag_indl",
obs=101,
keep=c("PNR","diag","diagtype"),
set.hook="firstobs=5",
step.hook="by PNR; firstPNR=first.PNR;",
where="diag contains 'I'")
str(df3)
df3
# if you are more fluent in SAS than in R it may help to
# know how to communicate e.g., a keep statement, instead of
# using the keep argument:
df4 <-importSAS(filename="X:/Data/Rawdata_Hurtig/999999/Dream201801",
set.hook="keep=pnr branch;",
obs=1000)
# Because the "overwrite" argument is FALSE, running the above code again will abort the import
# to not overwrite the temporary files.
# Setting "overwrite=TRUE" will allow the function to overwrite the files.
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.