importers: Object Oriented Interface to Foreign Files

Description Usage Arguments Details Value See Also Examples

Description

Importer objects are objects that refer to an external data file. Currently only Stata files, SPSS system, portable, and fixed-column files are supported.

Data are actually imported by ‘translating’ an importer file into a data.set using as.data.set or subset.

The importer mechanism is more flexible and extensible than read.spss and read.dta of package "foreign", as most of the parsing of the file headers is done in R. It is also adapted to efficiently load large data sets. Most importantly, importer objects support the labels, missing.values, and descriptions, provided by this package.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
spss.fixed.file(file,
  columns.file,
  varlab.file=NULL,
  codes.file=NULL,
  missval.file=NULL,
  count.cases=TRUE,
  to.lower=TRUE
  )

spss.portable.file(file,
  varlab.file=NULL,
  codes.file=NULL,
  missval.file=NULL,
  count.cases=TRUE,
  to.lower=TRUE)

spss.system.file(file,
  varlab.file=NULL,
  codes.file=NULL,
  missval.file=NULL,
  count.cases=TRUE,
  to.lower=TRUE)

Stata.file(file)

## The most important methods for "importer" objects are:
## S4 method for signature 'importer'
subset(x, subset, select, drop = FALSE, ...)

## S4 method for signature 'importer'
as.data.set(x,row.names=NULL,optional=NULL,
                    compress.storage.modes=FALSE,...)

Arguments

x

an object that inherits from class "importer".

file

character string; the path to the file containing the data

columns.file

character string; the path to an SPSS/PSPP syntax file with a DATA LIST FIXED statement

varlab.file

character string; the path to an SPSS/PSPP syntax file with a VARIABLE LABELS statement

codes.file

character string; the path to an SPSS/PSPP syntax file with a VALUE LABELS statement

missval.file

character string; the path to an SPSS/PSPP syntax file with a MISSING VALUES statement

count.cases

logical; should cases in file be counted? This takes effect only if the data file does not already contain information about the number of cases.

to.lower

logical; should variable names changed to lower case?

subset

a logical vector or an expression containing variables from the external data file that evaluates to logical.

select

a vector of variable names from the external data file. This may also be a named vector, where the names give the names into which the variables from the external data file are renamed.

drop

a logical value, that determines what happens if only one column is selected. If TRUE and only one column is selected, subset returns only a single item object and not a data.set.

row.names

ignored, present only for compatibility.

optional

ignored, present only for compatibility.

compress.storage.modes

logical value; if TRUE floating point values are converted to integers if possible without loss of information.

...

other arguments; ignored.

Details

A call to a ‘constructor’ for an importer object, that is, spss.fixed.file, spss.portable.file, spss.sysntax.file, or Stata.file, causes R to read in the header of the data file and/or the syntax files that contain information about the variables, such as the columns that they occupy (in case of spss.fixed.file), variable labels, value labels and missing values.

The information in the file header and/or the accompagnying files is then processed to prepare the file for importing. Thus the inner structure of an importer object may well vary according to what type of file is to imported and what additional information is given.

The as.data.set and subset methods for "importer" objects internally use the generic functions seekData, readData, and readSubset, which have methods for the subclasses of "importer". These functions are not callable from outside the package, however.

Since the functions described here are more or less complete rewrite based on the description of the file structure provided by the documenation for PSPP, they are perhaps not as thorougly tested as the functions in the foreign package, apart from the frequent use by the author of this package.

Value

spss.fixed.file, spss.portable.file, spss.system.file, and Stata.file return, respectively, objects of class "spss.fixed.importer", "spss.portable.importer", "spss.system.importer", or "Stata.importer", which, by inheritance, are also objects of class "importer".

Objects of class "importer" have at least the following two slots:

ptr

an external pointer

variables

a list of objects of class "item.vector" which provides a ‘prototype’ for the "data.set" set objects returned by the as.data.set and subset methods for objects of class "importer"

The as.data.frame for importer objects does the actual data import and returns a data frame. Note that in contrast to read.spss, the variable names of the resulting data frame will be lower case, unless the importer function is called with to.lower=FALSE. If long variable names are defined (in case of a PSPP/SPSS system file), they take precedence and are not coerced to lower case.

See Also

codebook, description, read.spss

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# Extract American National Election Study of 1948
nes1948.por <- unzip(system.file("anes/NES1948.ZIP",package="memisc"),
                     "NES1948.POR",exdir=tempfile())

# Get information about the variables contained.
nes1948 <- spss.portable.file(nes1948.por)

# The data are not yet loaded:
show(nes1948)

# ... but one can see what variables are present:
description(nes1948)

# Now a subset of the data is loaded:
vote.socdem.48 <- subset(nes1948,
              select=c(
                  v480018,
                  v480029,
                  v480030,
                  v480045,
                  v480046,
                  v480047,
                  v480048,
                  v480049,
                  v480050
                  ))

# Let's make the names more descriptive:
vote.socdem.48 <- rename(vote.socdem.48,
                  v480018 = "vote",
                  v480029 = "occupation.hh",
                  v480030 = "unionized.hh",
                  v480045 = "gender",
                  v480046 = "race",
                  v480047 = "age",
                  v480048 = "education",
                  v480049 = "total.income",
                  v480050 = "religious.pref"
        )

# It is also possible to do both
# in one step:
# vote.socdem.48 <- subset(nes1948,
#              select=c(
#                  vote           = v480018,
#                  occupation.hh  = v480029,
#                  unionized.hh   = v480030,
#                  gender         = v480045,
#                  race           = v480046,
#                  age            = v480047,
#                  education      = v480048,
#                  total.income   = v480049,
#                  religious.pref = v480050
#                  ))



# We examine the data more closely:
codebook(vote.socdem.48)

# ... and conduct some analyses.
#
t(genTable(percent(vote)~occupation.hh,data=vote.socdem.48))

# We consider only the two main candidates.
vote.socdem.48 <- within(vote.socdem.48,{
  truman.dewey <- vote
  valid.values(truman.dewey) <- 1:2
  truman.dewey <- relabel(truman.dewey,
              "VOTED - FOR TRUMAN" = "Truman",
              "VOTED - FOR DEWEY"  = "Dewey")
  })

summary(truman.relig.glm <- glm((truman.dewey=="Truman")~religious.pref,
    data=vote.socdem.48,
    family="binomial",
))

Example output

Loading required package: lattice
Loading required package: MASS

Attaching package: 'memisc'

The following objects are masked from 'package:stats':

    contr.sum, contr.treatment, contrasts

The following object is masked from 'package:base':

    as.array

Warning messages:
1: Duplicate labels 'NAME NOT KNOWN' 
2: Duplicate labels 'HAVE HEARD OF TAFT-HARTLEY ACT' 
3: Duplicate labels 'DID NOT CONSIDER ANYONE ELSE' 'CONSIDERED WALLACE' 'CONSIDERED OTHER' 'NA' 'CONSIDERED TRUMAN' 
4: Duplicate labels 'DISAGREED WITH PLATFORM OR POLICY - TO' 
5: Duplicate labels 'DISAGREED WITH PLATFORM OR POLICY - TO' 
6: Duplicate labels 'RENT CONTROL' 'PRICE CONTROL' 'TAFT-HARTLEY' 'FARM PRICES AND SUPPORT' 'LOWER INCOME TAX' 'CIVIL RIGHTS' 'BALANCE BUDGET' '(GOVERNMENT) HOUSING' 'DEFENSE ACTIVITY' 'GOVERNMENT ATOMIC CONTROL' 'NEW DEAL' 'MARSHALL PLAN' 'FIRM RUSSIAN POLICY' 'HELP ISRAEL (PALESTINE)' 'PROMOTE PEACE' 
7: Duplicate labels 'RENT CONTROL' 'PRICE CONTROL' 'TAFT-HARTLEY' 'FARM PRICES AND SUPPORT' 'LOWER INCOME TAX' 'CIVIL RIGHTS' 'BALANCE BUDGET' '(GOVERNMENT) HOUSING' 'DEFENSE ACTIVITY' 'GOVERNMENT ATOMIC CONTROL' 'NEW DEAL' 'MARSHALL PLAN' 'FIRM RUSSIAN POLICY' 'HELP ISRAEL (PALESTINE)' 'PROMOTE PEACE' 
8: Duplicate labels 'RENT CONTROL' 'PRICE CONTROL' 'TAFT-HARTLEY' 'FARM PRICES AND SUPPORT' 'LOWER INCOME TAX' 'CIVIL RIGHTS' 'BALANCE BUDGET' '(GOVERNMENT) HOUSING' 'DEFENSE ACTIVITY' 'GOVERNMENT ATOMIC CONTROL' 'NEW DEAL' 'MARSHALL PLAN' 'FIRM RUSSIAN POLICY' 'HELP ISRAEL (PALESTINE)' 'PROMOTE PEACE' 
9: Duplicate labels 'RENT CONTROL' 'PRICE CONTROL' 'TAFT-HARTLEY' 'FARM PRICES AND SUPPORT' 'LOWER INCOME TAX' 'CIVIL RIGHTS' 'BALANCE BUDGET' '(GOVERNMENT) HOUSING' 'DEFENSE ACTIVITY' 'GOVERNMENT ATOMIC CONTROL' 'NEW DEAL' 'MARSHALL PLAN' 'FIRM RUSSIAN POLICY' 'HELP ISRAEL (PALESTINE)' 'PROMOTE PEACE' 

SPSS portable file '/work/tmp/tmp/RtmpPsnxGK/file2ed67151eab2/NES1948.POR' 
	with 67 variables and 662 observations

 vversion 'NES VERSION NUMBER'        
 vdsetno  'NES DATASET NUMBER'        
 v480001  'ICPSR ARCHIVE NUMBER'      
 v480002  'INTERVIEW NUMBER'          
 v480003  'POP CLASSIFICATION'        
 v480004  'CODER'                     
 v480005  'NUMBER OF CALLS TO R'      
 v480006  'R REMEMBER PREVIOUS INT'   
 v480007  'INTR INTERVIEW THIS R'     
 v480008  'PRVS PRE-ELCTN R REINT'    
 v480009  'R INT IN PRE/POSTELCTN'    
 v480010  'RENT CNTRL KEPT/DROPPED'   
 v480011  'GOVT CONTROL PRICES'       
 v480012  'WHAT TO DO W TFT-HT ACT'   
 v480013  'PRESLELCTN OTCM SURPRISE'  
 v480014a 'WHY PPL VTD FOR TRUMAN 1'  
 v480014b 'WHY PPL VTD FOR TRUMAN 2'  
 v480015a 'WHY PPL VTD AGNST TRUMAN 1'
 v480015b 'WHY PPL VTD AGNST TRUMAN 2'
 v480016a 'WHY PPL VTD FOR DEWEY 1'   
 v480016b 'WHY PPL VTD FOR DEWEY 2'   
 v480017a 'WHY PPL VTD AGNST DEWEY 1' 
 v480017b 'WHY PPL VTD AGNST DEWEY 2' 
 v480018  'DID R VOTE/FOR WHOM'       
 v480019  'WN DECIDE FOR WHOM TO VT'  
 v480020  'CNSD VT FOR SOMEONE ELSE'  
 v480021a 'XWHY DID NOT VT FOR HIM 1' 
 v480021b 'XWHY DID NOT VT FOR HIM 2' 
 v480022a 'WHY VT THE WAY YOU DID 1'  
 v480022b 'WHY VT THE WAY YOU DID 2'  
 v480023  'VOTED STRAIGHT TICKET'     
 v480024  'R NOT VT-IF VT,FOR WHOM'   
 v480025a 'R NOT VT-WHY DID NOT VT 1' 
 v480025b 'R NOT VT-WHY DID NOT VT 2' 
 v480026  'R NOT VT-WAS R REG TO VT'  
 v480027  'VTD IN PRVS PRESL ELCTN'   
 v480028  'VTD FOR WHOM IN 1944'      
 v480029  'OCCUPATION OF HEAD'        
 v480030  'HEAD BELONG TO LBR UN'     
 v480031a 'GRPS IDENTIFIED W TRUMAN 1'
 v480031b 'GRPS IDENTIFIED W TRUMAN 2'
 v480031c 'GRPS IDENTIFIED W TRUMAN 3'
 v480032a 'GRPS IDENTIFIED W DEWEY 1' 
 v480032b 'GRPS IDENTIFIED W DEWEY 2' 
 v480032c 'GRPS IDENTIFIED W DEWEY 3' 
 v480033a 'ISSUES CONNECTED W TRMN 1' 
 v480033b 'ISSUES CONNECTED W TRMN 2' 
 v480034a 'ISSUES CONNECTED W DEWEY 1'
 v480034b 'ISSUES CONNECTED W DEWEY 2'
 v480035a 'PERSONAL ATTRIBUTE TRMN 1' 
 v480035b 'PERSONAL ATTRIBUTE TRMN 2' 
 v480036a 'PERSONAL ATTRIBUTE DEWEY 1'
 v480036b 'PERSONAL ATTRIBUTE DEWEY 2'
 v480037  'CMPN INCIDENTS MENTIONED'  
 v480038  '41-PRESLELCTN PLAN TO VT'  
 v480039  '41-PLAN TO VT REP/DEM'     
 v480040  '41-USA'S CNCRN W OTHERS'   
 v480041  '41-SATISD USA TWRD RUSS'   
 v480042  '41-INFORMATION LEVEL'      
 v480043  '41-USA GV IN,AGRT RUSS'    
 v480044  '41-USA-RUSS AGRT VIA U.N'  
 v480045  'SEX OF RESPONDENT'         
 v480046  'RACE OF RESPONDENT'        
 v480047  'AGE OF RESPONDENT'         
 v480048  'EDUCATION OF RESPONDENT'   
 v480049  'TOTAL 1948 INCOME'         
 v480050  'RELIGIOUS PREFERENCE'      

================================================================================

   vote 'DID R VOTE/FOR WHOM'

--------------------------------------------------------------------------------

   Storage mode: double
   Measurement: nominal
   Missing values: 9

           Values and labels    N    Percent 
                                             
   1   'VOTED - FOR TRUMAN'   212   32.1 32.0
   2   'VOTED - FOR DEWEY'    178   27.0 26.9
   3   'VOTED - FOR WALLACE'    1    0.2  0.2
   4   'VOTED - FOR OTHER'     11    1.7  1.7
   5   'VOTED - NA FOR WHOM'   20    3.0  3.0
   6   'DID NOT VOTE'         238   36.1 36.0
   9 M 'NA WHETHER VOTED'       2         0.3

================================================================================

   occupation.hh 'OCCUPATION OF HEAD'

--------------------------------------------------------------------------------

   Storage mode: double
   Measurement: nominal
   Missing values: 99

                                Values and labels    N    Percent 
                                                                  
   10   'PROFESSIONAL, SEMI-PROFESSIONAL'           44    6.9  6.6
   20   'SELF-EMPLOYED, MANAGERIAL, SUPERVISORY'    73   11.5 11.0
   30   'OTHER WHITE-COLLAR (CLERICAL, SALES, ET'   79   12.5 11.9
   40   'SKILLED AND SEMI-SKILLED'                 164   25.9 24.8
   60   'PROTECTIVE SERVICE'                         6    0.9  0.9
   70   'UNSKILLED, INCLUDING FARM AND SERVICE W'   85   13.4 12.8
   80   'FARM OPERATORS AND MANAGERS'              105   16.6 15.9
   92   'STUDENT'                                    7    1.1  1.1
   94   'UNEMPLOYED'                                 5    0.8  0.8
   95   'RETIRED, TOO OLD OR UNABLE TO WORK'        38    6.0  5.7
   96   'HOUSEWIFE'                                 28    4.4  4.2
   99 M 'NA'                                        28         4.2

================================================================================

   unionized.hh 'HEAD BELONG TO LBR UN'

--------------------------------------------------------------------------------

   Storage mode: double
   Measurement: nominal
   Missing values: 8-Inf

   Values and labels    N    Percent 
                                     
           1   'YES'  150   23.3 22.7
           2   'NO'   493   76.7 74.5
           8 M 'DK'     5         0.8
           9 M 'NA'    14         2.1

================================================================================

   gender 'SEX OF RESPONDENT'

--------------------------------------------------------------------------------

   Storage mode: double
   Measurement: nominal
   Missing values: 9

   Values and labels    N    Percent 
                                     
        1   'MALE'    302   45.8 45.6
        2   'FEMALE'  357   54.2 53.9
        9 M 'NA'        3         0.5

================================================================================

   race 'RACE OF RESPONDENT'

--------------------------------------------------------------------------------

   Storage mode: double
   Measurement: nominal
   Missing values: 9

   Values and labels    N    Percent 
                                     
         1   'WHITE'  585   90.7 88.4
         2   'NEGRO'   60    9.3  9.1
         3   'OTHER'    0    0.0  0.0
         9 M 'NA'      17         2.6

================================================================================

   age 'AGE OF RESPONDENT'

--------------------------------------------------------------------------------

   Storage mode: double
   Measurement: nominal
   Missing values: 9

   Values and labels    N    Percent 
                                     
   1   '18-24'         57    8.7  8.6
   2   '25-34'        142   21.7 21.5
   3   '35-44'        174   26.6 26.3
   4   '45-54'        125   19.1 18.9
   5   '55-64'         86   13.1 13.0
   6   '65 AND OVER'   70   10.7 10.6
   9 M 'NA'             8         1.2

================================================================================

   education 'EDUCATION OF RESPONDENT'

--------------------------------------------------------------------------------

   Storage mode: double
   Measurement: nominal
   Missing values: 9

    Values and labels    N    Percent 
                                      
   1   'GRADE SCHOOL'  292   44.4 44.1
   2   'HIGH SCHOOL'   266   40.4 40.2
   3   'COLLEGE'       100   15.2 15.1
   9 M 'NA'              4         0.6

================================================================================

   total.income 'TOTAL 1948 INCOME'

--------------------------------------------------------------------------------

   Storage mode: double
   Measurement: nominal
   Missing values: 9

      Values and labels    N    Percent 
                                        
   1   'UNDER $500'       25    3.8  3.8
   2   '$500-$999'        43    6.6  6.5
   3   '$1000-1999'      110   16.8 16.6
   4   '$2000-2999'      185   28.2 27.9
   5   '$3000-3999'      142   21.7 21.5
   6   '$4000-4999'       66   10.1 10.0
   7   '$5000 AND OVER'   84   12.8 12.7
   9 M 'NA'                7         1.1

================================================================================

   religious.pref 'RELIGIOUS PREFERENCE'

--------------------------------------------------------------------------------

   Storage mode: double
   Measurement: nominal
   Missing values: 9

   Values and labels    N    Percent 
                                     
    1   'PROTESTANT'  460   70.0 69.5
    2   'CATHOLIC'    140   21.3 21.1
    3   'JEWISH'       25    3.8  3.8
    4   'OTHER'        14    2.1  2.1
    5   'NONE'         18    2.7  2.7
    9 M 'NA'            5         0.8

                                         
occupation.hh                             VOTED - FOR TRUMAN VOTED - FOR DEWEY
  PROFESSIONAL, SEMI-PROFESSIONAL                 22.7272727        50.0000000
  SELF-EMPLOYED, MANAGERIAL, SUPERVISORY           9.5890411        61.6438356
  OTHER WHITE-COLLAR (CLERICAL, SALES, ET         37.9746835        39.2405063
  SKILLED AND SEMI-SKILLED                        51.8292683        14.6341463
  PROTECTIVE SERVICE                              16.6666667        33.3333333
  UNSKILLED, INCLUDING FARM AND SERVICE W         32.9411765        11.7647059
  FARM OPERATORS AND MANAGERS                     24.7619048        13.3333333
  STUDENT                                         14.2857143        28.5714286
  UNEMPLOYED                                       0.0000000         0.0000000
  RETIRED, TOO OLD OR UNABLE TO WORK              27.0270270        43.2432432
  HOUSEWIFE                                       17.8571429        28.5714286
                                         
occupation.hh                             VOTED - FOR WALLACE VOTED - FOR OTHER
  PROFESSIONAL, SEMI-PROFESSIONAL                   0.0000000         2.2727273
  SELF-EMPLOYED, MANAGERIAL, SUPERVISORY            0.0000000         1.3698630
  OTHER WHITE-COLLAR (CLERICAL, SALES, ET           0.0000000         0.0000000
  SKILLED AND SEMI-SKILLED                          0.6097561         1.2195122
  PROTECTIVE SERVICE                                0.0000000        16.6666667
  UNSKILLED, INCLUDING FARM AND SERVICE W           0.0000000         0.0000000
  FARM OPERATORS AND MANAGERS                       0.0000000         2.8571429
  STUDENT                                           0.0000000         0.0000000
  UNEMPLOYED                                        0.0000000         0.0000000
  RETIRED, TOO OLD OR UNABLE TO WORK                0.0000000         2.7027027
  HOUSEWIFE                                         0.0000000         0.0000000
                                         
occupation.hh                             VOTED - NA FOR WHOM DID NOT VOTE
  PROFESSIONAL, SEMI-PROFESSIONAL                   2.2727273   22.7272727
  SELF-EMPLOYED, MANAGERIAL, SUPERVISORY            1.3698630   26.0273973
  OTHER WHITE-COLLAR (CLERICAL, SALES, ET           5.0632911   17.7215190
  SKILLED AND SEMI-SKILLED                          2.4390244   29.2682927
  PROTECTIVE SERVICE                                0.0000000   33.3333333
  UNSKILLED, INCLUDING FARM AND SERVICE W           4.7058824   50.5882353
  FARM OPERATORS AND MANAGERS                       1.9047619   57.1428571
  STUDENT                                           0.0000000   57.1428571
  UNEMPLOYED                                       20.0000000   80.0000000
  RETIRED, TOO OLD OR UNABLE TO WORK                2.7027027   24.3243243
  HOUSEWIFE                                         0.0000000   53.5714286
                                         
occupation.hh                                       N
  PROFESSIONAL, SEMI-PROFESSIONAL          44.0000000
  SELF-EMPLOYED, MANAGERIAL, SUPERVISORY   73.0000000
  OTHER WHITE-COLLAR (CLERICAL, SALES, ET  79.0000000
  SKILLED AND SEMI-SKILLED                164.0000000
  PROTECTIVE SERVICE                        6.0000000
  UNSKILLED, INCLUDING FARM AND SERVICE W  85.0000000
  FARM OPERATORS AND MANAGERS             105.0000000
  STUDENT                                   7.0000000
  UNEMPLOYED                                5.0000000
  RETIRED, TOO OLD OR UNABLE TO WORK       37.0000000
  HOUSEWIFE                                28.0000000

Call:
glm(formula = (truman.dewey == "Truman") ~ religious.pref, family = "binomial", 
    data = vote.socdem.48)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-1.46927  -1.12217   0.00036   1.23367   1.32323  

Coefficients:
                        Estimate Std. Error z value Pr(>|z|)   
(Intercept)             -0.13134    0.12831  -1.024  0.30604   
religious.prefCATHOLIC   0.79550    0.24442   3.255  0.00114 **
religious.prefJEWISH    16.69740  536.55453   0.031  0.97517   
religious.prefOTHER     -0.05099    0.61898  -0.082  0.93435   
religious.prefNONE      -0.20514    0.59943  -0.342  0.73219   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 537.69  on 389  degrees of freedom
Residual deviance: 500.69  on 385  degrees of freedom
  (272 observations deleted due to missingness)
AIC: 510.69

Number of Fisher Scoring iterations: 15

memisc documentation built on May 2, 2019, 5:45 p.m.

Related to importers in memisc...