Description Usage Arguments Details Value See Also Examples
Importer objects are objects that refer to an external data file. Currently only Stata files, SPSS system, portable, and fixed-column files are supported.
Data are actually imported by ‘translating’ an
importer file into a data.set
using
as.data.set
or subset
.
The importer
mechanism is more flexible and extensible
than read.spss
and read.dta
of package "foreign", as most of the parsing of the file headers is done in R.
It is also adapted to efficiently load large data sets.
Most importantly, importer objects support the
labels
, missing.values
,
and description
s, provided by this package.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | spss.fixed.file(file,
columns.file,
varlab.file=NULL,
codes.file=NULL,
missval.file=NULL,
count.cases=TRUE,
to.lower=TRUE
)
spss.portable.file(file,
varlab.file=NULL,
codes.file=NULL,
missval.file=NULL,
count.cases=TRUE,
to.lower=TRUE)
spss.system.file(file,
varlab.file=NULL,
codes.file=NULL,
missval.file=NULL,
count.cases=TRUE,
to.lower=TRUE)
Stata.file(file)
## The most important methods for "importer" objects are:
## S4 method for signature 'importer'
subset(x, subset, select, drop = FALSE, ...)
## S4 method for signature 'importer'
as.data.set(x,row.names=NULL,optional=NULL,
compress.storage.modes=FALSE,...)
|
x |
an object that inherits from class |
file |
character string; the path to the file containing the data |
columns.file |
character string; the path to an
SPSS/PSPP syntax file with a |
varlab.file |
character string; the path to an
SPSS/PSPP syntax file with a |
codes.file |
character string; the path to an
SPSS/PSPP syntax file with a |
missval.file |
character string; the path to an
SPSS/PSPP syntax file with a |
count.cases |
logical; should cases in file be counted? This takes effect only if the data file does not already contain information about the number of cases. |
to.lower |
logical; should variable names changed to lower case? |
subset |
a logical vector or an expression containing variables from the external data file that evaluates to logical. |
select |
a vector of variable names from the external data file. This may also be a named vector, where the names give the names into which the variables from the external data file are renamed. |
drop |
a logical value, that determines what happens if
only one column is selected. If TRUE and only one column
is selected, |
row.names |
ignored, present only for compatibility. |
optional |
ignored, present only for compatibility. |
compress.storage.modes |
logical value; if TRUE floating point values are converted to integers if possible without loss of information. |
... |
other arguments; ignored. |
A call to a ‘constructor’ for an importer object, that is,
spss.fixed.file
, spss.portable.file
, spss.sysntax.file
,
or Stata.file
,
causes R to read in the header of the data file and/or
the syntax files that contain information about
the variables, such as the columns that they occupy
(in case of spss.fixed.file
), variable labels,
value labels and missing values.
The information in the file header and/or the accompagnying
files is then processed to prepare the file for importing.
Thus the inner structure of an importer
object may
well vary according to what type of file is to imported and
what additional information is given.
The as.data.set
and subset
methods
for "importer"
objects internally use the
generic functions seekData
, readData
,
and readSubset
, which have methods for the
subclasses of "importer"
.
These functions are not callable
from outside the package, however.
Since the functions described here are more or less complete rewrite
based on the description of the file structure provided
by the documenation for PSPP, they are perhaps not as thorougly tested as the
functions in the foreign
package, apart from the frequent use
by the author of this package.
spss.fixed.file
, spss.portable.file
,
spss.system.file
, and Stata.file
return, respectively, objects of class
"spss.fixed.importer"
, "spss.portable.importer"
,
"spss.system.importer"
, or "Stata.importer"
,
which, by inheritance, are also objects of class "importer"
.
Objects of class "importer"
have at least the following two slots:
ptr |
an external pointer |
variables |
a list of objects of class |
The as.data.frame
for importer
objects does
the actual data import and returns a data frame. Note that in contrast
to read.spss
, the variable names of the
resulting data frame will be lower case, unless the importer function
is called with to.lower=FALSE
. If long variable names
are defined (in case of a PSPP/SPSS system file), they take
precedence and are not coerced to lower case.
codebook
, description
,
read.spss
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | # Extract American National Election Study of 1948
nes1948.por <- unzip(system.file("anes/NES1948.ZIP",package="memisc"),
"NES1948.POR",exdir=tempfile())
# Get information about the variables contained.
nes1948 <- spss.portable.file(nes1948.por)
# The data are not yet loaded:
show(nes1948)
# ... but one can see what variables are present:
description(nes1948)
# Now a subset of the data is loaded:
vote.socdem.48 <- subset(nes1948,
select=c(
v480018,
v480029,
v480030,
v480045,
v480046,
v480047,
v480048,
v480049,
v480050
))
# Let's make the names more descriptive:
vote.socdem.48 <- rename(vote.socdem.48,
v480018 = "vote",
v480029 = "occupation.hh",
v480030 = "unionized.hh",
v480045 = "gender",
v480046 = "race",
v480047 = "age",
v480048 = "education",
v480049 = "total.income",
v480050 = "religious.pref"
)
# It is also possible to do both
# in one step:
# vote.socdem.48 <- subset(nes1948,
# select=c(
# vote = v480018,
# occupation.hh = v480029,
# unionized.hh = v480030,
# gender = v480045,
# race = v480046,
# age = v480047,
# education = v480048,
# total.income = v480049,
# religious.pref = v480050
# ))
# We examine the data more closely:
codebook(vote.socdem.48)
# ... and conduct some analyses.
#
t(genTable(percent(vote)~occupation.hh,data=vote.socdem.48))
# We consider only the two main candidates.
vote.socdem.48 <- within(vote.socdem.48,{
truman.dewey <- vote
valid.values(truman.dewey) <- 1:2
truman.dewey <- relabel(truman.dewey,
"VOTED - FOR TRUMAN" = "Truman",
"VOTED - FOR DEWEY" = "Dewey")
})
summary(truman.relig.glm <- glm((truman.dewey=="Truman")~religious.pref,
data=vote.socdem.48,
family="binomial",
))
|
Loading required package: lattice
Loading required package: MASS
Attaching package: 'memisc'
The following objects are masked from 'package:stats':
contr.sum, contr.treatment, contrasts
The following object is masked from 'package:base':
as.array
Warning messages:
1: Duplicate labels 'NAME NOT KNOWN'
2: Duplicate labels 'HAVE HEARD OF TAFT-HARTLEY ACT'
3: Duplicate labels 'DID NOT CONSIDER ANYONE ELSE' 'CONSIDERED WALLACE' 'CONSIDERED OTHER' 'NA' 'CONSIDERED TRUMAN'
4: Duplicate labels 'DISAGREED WITH PLATFORM OR POLICY - TO'
5: Duplicate labels 'DISAGREED WITH PLATFORM OR POLICY - TO'
6: Duplicate labels 'RENT CONTROL' 'PRICE CONTROL' 'TAFT-HARTLEY' 'FARM PRICES AND SUPPORT' 'LOWER INCOME TAX' 'CIVIL RIGHTS' 'BALANCE BUDGET' '(GOVERNMENT) HOUSING' 'DEFENSE ACTIVITY' 'GOVERNMENT ATOMIC CONTROL' 'NEW DEAL' 'MARSHALL PLAN' 'FIRM RUSSIAN POLICY' 'HELP ISRAEL (PALESTINE)' 'PROMOTE PEACE'
7: Duplicate labels 'RENT CONTROL' 'PRICE CONTROL' 'TAFT-HARTLEY' 'FARM PRICES AND SUPPORT' 'LOWER INCOME TAX' 'CIVIL RIGHTS' 'BALANCE BUDGET' '(GOVERNMENT) HOUSING' 'DEFENSE ACTIVITY' 'GOVERNMENT ATOMIC CONTROL' 'NEW DEAL' 'MARSHALL PLAN' 'FIRM RUSSIAN POLICY' 'HELP ISRAEL (PALESTINE)' 'PROMOTE PEACE'
8: Duplicate labels 'RENT CONTROL' 'PRICE CONTROL' 'TAFT-HARTLEY' 'FARM PRICES AND SUPPORT' 'LOWER INCOME TAX' 'CIVIL RIGHTS' 'BALANCE BUDGET' '(GOVERNMENT) HOUSING' 'DEFENSE ACTIVITY' 'GOVERNMENT ATOMIC CONTROL' 'NEW DEAL' 'MARSHALL PLAN' 'FIRM RUSSIAN POLICY' 'HELP ISRAEL (PALESTINE)' 'PROMOTE PEACE'
9: Duplicate labels 'RENT CONTROL' 'PRICE CONTROL' 'TAFT-HARTLEY' 'FARM PRICES AND SUPPORT' 'LOWER INCOME TAX' 'CIVIL RIGHTS' 'BALANCE BUDGET' '(GOVERNMENT) HOUSING' 'DEFENSE ACTIVITY' 'GOVERNMENT ATOMIC CONTROL' 'NEW DEAL' 'MARSHALL PLAN' 'FIRM RUSSIAN POLICY' 'HELP ISRAEL (PALESTINE)' 'PROMOTE PEACE'
SPSS portable file '/work/tmp/tmp/RtmpPsnxGK/file2ed67151eab2/NES1948.POR'
with 67 variables and 662 observations
vversion 'NES VERSION NUMBER'
vdsetno 'NES DATASET NUMBER'
v480001 'ICPSR ARCHIVE NUMBER'
v480002 'INTERVIEW NUMBER'
v480003 'POP CLASSIFICATION'
v480004 'CODER'
v480005 'NUMBER OF CALLS TO R'
v480006 'R REMEMBER PREVIOUS INT'
v480007 'INTR INTERVIEW THIS R'
v480008 'PRVS PRE-ELCTN R REINT'
v480009 'R INT IN PRE/POSTELCTN'
v480010 'RENT CNTRL KEPT/DROPPED'
v480011 'GOVT CONTROL PRICES'
v480012 'WHAT TO DO W TFT-HT ACT'
v480013 'PRESLELCTN OTCM SURPRISE'
v480014a 'WHY PPL VTD FOR TRUMAN 1'
v480014b 'WHY PPL VTD FOR TRUMAN 2'
v480015a 'WHY PPL VTD AGNST TRUMAN 1'
v480015b 'WHY PPL VTD AGNST TRUMAN 2'
v480016a 'WHY PPL VTD FOR DEWEY 1'
v480016b 'WHY PPL VTD FOR DEWEY 2'
v480017a 'WHY PPL VTD AGNST DEWEY 1'
v480017b 'WHY PPL VTD AGNST DEWEY 2'
v480018 'DID R VOTE/FOR WHOM'
v480019 'WN DECIDE FOR WHOM TO VT'
v480020 'CNSD VT FOR SOMEONE ELSE'
v480021a 'XWHY DID NOT VT FOR HIM 1'
v480021b 'XWHY DID NOT VT FOR HIM 2'
v480022a 'WHY VT THE WAY YOU DID 1'
v480022b 'WHY VT THE WAY YOU DID 2'
v480023 'VOTED STRAIGHT TICKET'
v480024 'R NOT VT-IF VT,FOR WHOM'
v480025a 'R NOT VT-WHY DID NOT VT 1'
v480025b 'R NOT VT-WHY DID NOT VT 2'
v480026 'R NOT VT-WAS R REG TO VT'
v480027 'VTD IN PRVS PRESL ELCTN'
v480028 'VTD FOR WHOM IN 1944'
v480029 'OCCUPATION OF HEAD'
v480030 'HEAD BELONG TO LBR UN'
v480031a 'GRPS IDENTIFIED W TRUMAN 1'
v480031b 'GRPS IDENTIFIED W TRUMAN 2'
v480031c 'GRPS IDENTIFIED W TRUMAN 3'
v480032a 'GRPS IDENTIFIED W DEWEY 1'
v480032b 'GRPS IDENTIFIED W DEWEY 2'
v480032c 'GRPS IDENTIFIED W DEWEY 3'
v480033a 'ISSUES CONNECTED W TRMN 1'
v480033b 'ISSUES CONNECTED W TRMN 2'
v480034a 'ISSUES CONNECTED W DEWEY 1'
v480034b 'ISSUES CONNECTED W DEWEY 2'
v480035a 'PERSONAL ATTRIBUTE TRMN 1'
v480035b 'PERSONAL ATTRIBUTE TRMN 2'
v480036a 'PERSONAL ATTRIBUTE DEWEY 1'
v480036b 'PERSONAL ATTRIBUTE DEWEY 2'
v480037 'CMPN INCIDENTS MENTIONED'
v480038 '41-PRESLELCTN PLAN TO VT'
v480039 '41-PLAN TO VT REP/DEM'
v480040 '41-USA'S CNCRN W OTHERS'
v480041 '41-SATISD USA TWRD RUSS'
v480042 '41-INFORMATION LEVEL'
v480043 '41-USA GV IN,AGRT RUSS'
v480044 '41-USA-RUSS AGRT VIA U.N'
v480045 'SEX OF RESPONDENT'
v480046 'RACE OF RESPONDENT'
v480047 'AGE OF RESPONDENT'
v480048 'EDUCATION OF RESPONDENT'
v480049 'TOTAL 1948 INCOME'
v480050 'RELIGIOUS PREFERENCE'
================================================================================
vote 'DID R VOTE/FOR WHOM'
--------------------------------------------------------------------------------
Storage mode: double
Measurement: nominal
Missing values: 9
Values and labels N Percent
1 'VOTED - FOR TRUMAN' 212 32.1 32.0
2 'VOTED - FOR DEWEY' 178 27.0 26.9
3 'VOTED - FOR WALLACE' 1 0.2 0.2
4 'VOTED - FOR OTHER' 11 1.7 1.7
5 'VOTED - NA FOR WHOM' 20 3.0 3.0
6 'DID NOT VOTE' 238 36.1 36.0
9 M 'NA WHETHER VOTED' 2 0.3
================================================================================
occupation.hh 'OCCUPATION OF HEAD'
--------------------------------------------------------------------------------
Storage mode: double
Measurement: nominal
Missing values: 99
Values and labels N Percent
10 'PROFESSIONAL, SEMI-PROFESSIONAL' 44 6.9 6.6
20 'SELF-EMPLOYED, MANAGERIAL, SUPERVISORY' 73 11.5 11.0
30 'OTHER WHITE-COLLAR (CLERICAL, SALES, ET' 79 12.5 11.9
40 'SKILLED AND SEMI-SKILLED' 164 25.9 24.8
60 'PROTECTIVE SERVICE' 6 0.9 0.9
70 'UNSKILLED, INCLUDING FARM AND SERVICE W' 85 13.4 12.8
80 'FARM OPERATORS AND MANAGERS' 105 16.6 15.9
92 'STUDENT' 7 1.1 1.1
94 'UNEMPLOYED' 5 0.8 0.8
95 'RETIRED, TOO OLD OR UNABLE TO WORK' 38 6.0 5.7
96 'HOUSEWIFE' 28 4.4 4.2
99 M 'NA' 28 4.2
================================================================================
unionized.hh 'HEAD BELONG TO LBR UN'
--------------------------------------------------------------------------------
Storage mode: double
Measurement: nominal
Missing values: 8-Inf
Values and labels N Percent
1 'YES' 150 23.3 22.7
2 'NO' 493 76.7 74.5
8 M 'DK' 5 0.8
9 M 'NA' 14 2.1
================================================================================
gender 'SEX OF RESPONDENT'
--------------------------------------------------------------------------------
Storage mode: double
Measurement: nominal
Missing values: 9
Values and labels N Percent
1 'MALE' 302 45.8 45.6
2 'FEMALE' 357 54.2 53.9
9 M 'NA' 3 0.5
================================================================================
race 'RACE OF RESPONDENT'
--------------------------------------------------------------------------------
Storage mode: double
Measurement: nominal
Missing values: 9
Values and labels N Percent
1 'WHITE' 585 90.7 88.4
2 'NEGRO' 60 9.3 9.1
3 'OTHER' 0 0.0 0.0
9 M 'NA' 17 2.6
================================================================================
age 'AGE OF RESPONDENT'
--------------------------------------------------------------------------------
Storage mode: double
Measurement: nominal
Missing values: 9
Values and labels N Percent
1 '18-24' 57 8.7 8.6
2 '25-34' 142 21.7 21.5
3 '35-44' 174 26.6 26.3
4 '45-54' 125 19.1 18.9
5 '55-64' 86 13.1 13.0
6 '65 AND OVER' 70 10.7 10.6
9 M 'NA' 8 1.2
================================================================================
education 'EDUCATION OF RESPONDENT'
--------------------------------------------------------------------------------
Storage mode: double
Measurement: nominal
Missing values: 9
Values and labels N Percent
1 'GRADE SCHOOL' 292 44.4 44.1
2 'HIGH SCHOOL' 266 40.4 40.2
3 'COLLEGE' 100 15.2 15.1
9 M 'NA' 4 0.6
================================================================================
total.income 'TOTAL 1948 INCOME'
--------------------------------------------------------------------------------
Storage mode: double
Measurement: nominal
Missing values: 9
Values and labels N Percent
1 'UNDER $500' 25 3.8 3.8
2 '$500-$999' 43 6.6 6.5
3 '$1000-1999' 110 16.8 16.6
4 '$2000-2999' 185 28.2 27.9
5 '$3000-3999' 142 21.7 21.5
6 '$4000-4999' 66 10.1 10.0
7 '$5000 AND OVER' 84 12.8 12.7
9 M 'NA' 7 1.1
================================================================================
religious.pref 'RELIGIOUS PREFERENCE'
--------------------------------------------------------------------------------
Storage mode: double
Measurement: nominal
Missing values: 9
Values and labels N Percent
1 'PROTESTANT' 460 70.0 69.5
2 'CATHOLIC' 140 21.3 21.1
3 'JEWISH' 25 3.8 3.8
4 'OTHER' 14 2.1 2.1
5 'NONE' 18 2.7 2.7
9 M 'NA' 5 0.8
occupation.hh VOTED - FOR TRUMAN VOTED - FOR DEWEY
PROFESSIONAL, SEMI-PROFESSIONAL 22.7272727 50.0000000
SELF-EMPLOYED, MANAGERIAL, SUPERVISORY 9.5890411 61.6438356
OTHER WHITE-COLLAR (CLERICAL, SALES, ET 37.9746835 39.2405063
SKILLED AND SEMI-SKILLED 51.8292683 14.6341463
PROTECTIVE SERVICE 16.6666667 33.3333333
UNSKILLED, INCLUDING FARM AND SERVICE W 32.9411765 11.7647059
FARM OPERATORS AND MANAGERS 24.7619048 13.3333333
STUDENT 14.2857143 28.5714286
UNEMPLOYED 0.0000000 0.0000000
RETIRED, TOO OLD OR UNABLE TO WORK 27.0270270 43.2432432
HOUSEWIFE 17.8571429 28.5714286
occupation.hh VOTED - FOR WALLACE VOTED - FOR OTHER
PROFESSIONAL, SEMI-PROFESSIONAL 0.0000000 2.2727273
SELF-EMPLOYED, MANAGERIAL, SUPERVISORY 0.0000000 1.3698630
OTHER WHITE-COLLAR (CLERICAL, SALES, ET 0.0000000 0.0000000
SKILLED AND SEMI-SKILLED 0.6097561 1.2195122
PROTECTIVE SERVICE 0.0000000 16.6666667
UNSKILLED, INCLUDING FARM AND SERVICE W 0.0000000 0.0000000
FARM OPERATORS AND MANAGERS 0.0000000 2.8571429
STUDENT 0.0000000 0.0000000
UNEMPLOYED 0.0000000 0.0000000
RETIRED, TOO OLD OR UNABLE TO WORK 0.0000000 2.7027027
HOUSEWIFE 0.0000000 0.0000000
occupation.hh VOTED - NA FOR WHOM DID NOT VOTE
PROFESSIONAL, SEMI-PROFESSIONAL 2.2727273 22.7272727
SELF-EMPLOYED, MANAGERIAL, SUPERVISORY 1.3698630 26.0273973
OTHER WHITE-COLLAR (CLERICAL, SALES, ET 5.0632911 17.7215190
SKILLED AND SEMI-SKILLED 2.4390244 29.2682927
PROTECTIVE SERVICE 0.0000000 33.3333333
UNSKILLED, INCLUDING FARM AND SERVICE W 4.7058824 50.5882353
FARM OPERATORS AND MANAGERS 1.9047619 57.1428571
STUDENT 0.0000000 57.1428571
UNEMPLOYED 20.0000000 80.0000000
RETIRED, TOO OLD OR UNABLE TO WORK 2.7027027 24.3243243
HOUSEWIFE 0.0000000 53.5714286
occupation.hh N
PROFESSIONAL, SEMI-PROFESSIONAL 44.0000000
SELF-EMPLOYED, MANAGERIAL, SUPERVISORY 73.0000000
OTHER WHITE-COLLAR (CLERICAL, SALES, ET 79.0000000
SKILLED AND SEMI-SKILLED 164.0000000
PROTECTIVE SERVICE 6.0000000
UNSKILLED, INCLUDING FARM AND SERVICE W 85.0000000
FARM OPERATORS AND MANAGERS 105.0000000
STUDENT 7.0000000
UNEMPLOYED 5.0000000
RETIRED, TOO OLD OR UNABLE TO WORK 37.0000000
HOUSEWIFE 28.0000000
Call:
glm(formula = (truman.dewey == "Truman") ~ religious.pref, family = "binomial",
data = vote.socdem.48)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.46927 -1.12217 0.00036 1.23367 1.32323
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.13134 0.12831 -1.024 0.30604
religious.prefCATHOLIC 0.79550 0.24442 3.255 0.00114 **
religious.prefJEWISH 16.69740 536.55453 0.031 0.97517
religious.prefOTHER -0.05099 0.61898 -0.082 0.93435
religious.prefNONE -0.20514 0.59943 -0.342 0.73219
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 537.69 on 389 degrees of freedom
Residual deviance: 500.69 on 385 degrees of freedom
(272 observations deleted due to missingness)
AIC: 510.69
Number of Fisher Scoring iterations: 15
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.