sqlsurvey: Large survey design based on SQL database

Description Usage Arguments Details Value See Also Examples

Description

Specifies a survey data set based on a connection to a SQL (currently MonetDB) database. The replicate-weights version (sqlrepsurvey) provides more functionality.

Usage

1
2
3
4
5
6
7
8
sqlsurvey(id=NULL, strata = NULL, weights = NULL, fpc = "0", driver=MonetDB.R(), database,
table.name, key ,check.factors=10,...)
## S3 method for class 'sqlsurvey'
close(con, ...)
## S3 method for class 'sqlsurvey'
open(con, driver,...)
## S3 method for class 'sqlmodelmatrix'
open(con, design,...)

Arguments

id

vector of strings with names of sampling unit identifiers. Use "1" or NULL for individual sampling

strata

vector of strings with names of stratum identifiers. Use NULL for unstratified sampling

weights

string with name of weighting variable, or "1" for unweighted analysis

fpc

vector of strings with name of population size variable (variable may be zero or literal '0' for sampling with replacement or infinite population size)

driver

database driver object (from MonetDB), or NULL if database is already a database connection

database

Either a \connectionMonetDBConnection or the name (URL) of the database containing the data table

table.name

Name for the data table containing the survey data and design variables

key

name of a variable that can be used as a unique key

check.factors

See below

con

survey object or survey model matrix object to be disconnected from or reconnected to the database

design

A sqlsurvey object with an active database connection

...

for sqlsurvey and open.sqlsurvey, other arguments to dbConnect, such as user,password, and database URL

Details

The point of check.factors is twofold: to allow numeric variables to be treated as factors, and to cache the factor levels for both numeric and character variables to speed computation. Ideally, you should specify a character vector with list of variable names that should be treated as factors. This is important because coercing a variable to factor in inline expressions is not widely supported in the sqlsurvey package.

If check.factors is numeric, variables with fewer than that many distinct levels will be treate as factors and have their levels cached. Character variables will always be treated as factors, but will not have their levels cached if there are more than the limit. If check.factors is zero, only character variables will be treated as factors and they will not have their levels cached.

Finally, check.factors can be a zero-row data frame. The only supported way to get this data frame is as the zdata component of another sqlsurvey or sqlrepsurvey object – eg, if the factors are computed for one US state and you wish to transfer them to another US state without recomputing.

The check.factors operation with a numeric limit can be slow (eg over an hour for an American Community Survey dataset with 9 million records and 300 variables). If the survey object is saved with save(), it can be reconnected to the database with open, so that it only needs to be created once. The most flexible and fastest approach is usually to create the zero-row data frame manually from the data documentation: only the columns for factor variables need to be supplied, as the code will assume other variables are numeric.

The design degrees-of-freedom should be one less than the rank of the matrix of replicate weights, but for large surveys this is computationally expensive. The code defaults to one less than the number of replicates, which is correct for JK1, BRR, and the American Community Survey. For JKn weights, use the number of PSUs minus number of strata, if this can be determined from the documentation.

close closes the database connection, first attempting to garbage-collect any tables corresponding to non-existent R objects.

open re-opens the database connection.

To avoid storing a password in a script, you can use,eg, password=readline("Password: ") to be prompted for a password, or set up a .monetdb file in the home directory.

Value

sqlsurvey returns an object of class sqlsurvey

See Also

sqlrepsurvey,MonetDB

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
## Not run: 
## assumes data already in database
library(sqlsurvey)
library(MonetDB.R)
monetdriver<-MonetDB.R()

sqhanes<-sqlsurvey(id="sdmvpsu",strata="sdmvstra",weights="fouryearwt",key="seqn",database="monetdb://localhost/ACS",driver=MonetDB(),table.name="nhanes")
sqhanes

## blood pressure by gender
svymean(~bpxsar+bpxdar,design=sqhanes,se=TRUE,byvar=~riagendr)

## hexbin scatterplot
svyplot(bpxsar~bpxdar, design=sqhanes, style="hex")

## linear model
svylm(bpxsar~bpxdar+ridreth1, design=sqhanes)

close(sqhanes)

sqhanes<-open(sqhanes,driver=monetdriver,user="monetdb",password="monetdb")
opar<-par(mfrow=c(2,2))
svyplot(bpxsar~bpxdar, design=sqhanes, style="subsample",col=ifelse(riagendr==1,"blue","magenta"),pch=19)

close(sqhanes)

## End(Not run)

sqlsurvey documentation built on May 2, 2019, 4:53 p.m.