knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
readspss
-- importing SPSS files to RWelcome to the readspss
package. The package was written from scratch with
additional code for special cases. Starting as a side project for me to learn
Rcpp the package did grow over the years. Today readspss
is well tested using
every sav, por and zsav file I could lay my hands on. Import is considered
feature complete, write support is available, just not yet for every feature.
Installation is provided using r-universe
or remotes
.
With r-universe:
options(repos = c( janmarvin = 'https://janmarvin.r-universe.dev', CRAN = 'https://cloud.r-project.org')) install.packages('readspss')
With devtools
:
remotes::install_github("JanMarvin/readspss")
For the import of (z)sav and por files read.sav()
and read.por()
are available.
library(readspss) # example using sav fl_sav <- system.file("extdata", "electric.sav", package = "readspss") ds <- read.sav(fl_sav) # example using zsav fl_zsav <- system.file("extdata", "cars.zsav", package = "readspss") dz <- read.sav(fl_zsav) # example using por fl_por <- system.file("extdata", "electric.por", package = "readspss") dp <- read.por(fl_por)
Both functions return data.frame objects, containing numerics, dates, factors or characters.
For user specific demand the package supports many option available in foreign
such as convert.factors
and use.missings
. If one is familiar with said
package, one should have no problem adapting to readspss
. If for example one
does not want to have factors since they work differently in R than in SPSS this
can be achieved using the following code.
# example using sav fl_sav <- system.file("extdata", "electric.sav", package = "readspss") ds <- read.sav(fl_sav, convert.factors = FALSE) # example using zsav fl_zsav <- system.file("extdata", "cars.zsav", package = "readspss") dz <- read.sav(fl_zsav, convert.factors = FALSE) # example using por fl_por <- system.file("extdata", "electric.por", package = "readspss") dp <- read.por(fl_por, convert.factors = FALSE)
Since many features are self explanatory not all will be explained. Of course
readspss
can handle sav files in different encodings; it handles file sets
without data; all types of missings SPSS developers over the years came up with;
short, long and longer strings; little and big endian files; sav, por and zsav
compressed files; files without valid header information; old and new SPSS
files. As stated above, every SPSS file I came across and during the development
I came across many.
Using code by Ben Pfaff readspss
can handle encrypted SPSS files.
flu <- system.file("extdata", "hotel.sav", package="readspss") fle <- system.file("extdata", "hotel-encrypted.sav", package="readspss") df_u <- read.sav(flu) df_e <- read.sav(fle, pass = "pspp")
If available in the SPSS file, the resulting data.frame will contain attributes
such as the datalabel, timestamp, filelabel, additional documentation and
information regarding missings, labels, file encoding. A complete list of
attributes can be found using ?read.sav
and ?read.por
.
R data.frame objects can be exported using write.sav()
and write.por()
.
library(readspss) write.sav(cars, filepath = "cars.sav") # optional compress = TRUE write.sav(cars, filepath = "cars.zsav") # optional compress = TRUE write.por(cars, filepath = "cars.por")
Export provides a few options to add a label, for compression of sav and zsav
files and conversion of dates. Currently it is not possible to export strings
longer than 255 chars. Obviously all exported files can be imported using
SPSS and readspss
(PSPP is expected to work).
One may wonder, why does the world need another package to import SPSS data to
the R world. Similar tasks can be done by foreign
, memisc
and haven
package.
Well the first two packages use code from older releases of PSPP and R-Core most
likely has neither time nor a need to update their codebase to a newer PSPP
release. Still over the years the SPSS file format has changed. Not drastically
but new features such as long strings were implemented. Features that foreign
cannot handle. The newest of the three aforementioned packages, haven
, is a
wrapper around the ReadStat
C library. The package development began around
the time we started with readstata13
so it is around quite some time now. Contrary to many other people in the R
world, I am not a huge fan of tibbles
which are an integral part of haven
.
One can agree that this is a minor problem. My bigger problem with the package
is, that I am not yet convinced that ReadStat
and haven
are tested enough.
Even though I am sure that authors of both made sure that in most cases their
software works, there are still cases where it does not. During the development
process of readspss
I reported a few bugs to the haven package. Among them
were incorrectly trimmed long strings and a severe bug where por-files imported
awfully incorrect values. All errors were found using publicly available data
files, writing unit tests and comparing data across different R-packages, PSPP
and various versions of SPSS. Until I see that such behavior is adopted by other
packages, I simply do not trust them and maybe you should not either. If the
import process of data fails, one does not have to worry about anything else.
The development of readspss
began once development of readstata13
slowed
down. Having written most of the c++
code to import dta-files, I learned a lot
about binary files and Rcpp development. Since SPSS was another statistical
software used at the university where I worked at that time, it felt natural to
have a look at sav-files. Shortly after I learned that the dta-file
documentation is priceless, not available for SPSS and development ceased for
quite some time. In February 2018 I changed jobs, resulting in many train rides.
A project was needed and development began again. Using the PSPP documentation
and countless hours of trial and error lead to the current state of the package.
readspss
uses code of Ben Pfaff for the encryption part. It uses code from
TDA by Goetz Rohwer and Ulrich
Poetter for the conversion of numerics in the por-parser. The
PSPP documentation
was a huge help. Without the testing by Ulrich Poetter this package would not be
as complete as it is.
Even though readspss
is tested a lot, it would be great, if users would
test their imports and exports and report their experience. Open an issue on the
github page or write me an email and let me know what you think. Bug free code
does not exist.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.